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Section I 

Basics of Designing Virtual 
Reality Systems 



Chapter 1 

Introduction: VR in a Nutshell 



What Is VR? 

"Virtual Reality (VR)" 1 is a field of study that aims to create a system that 
provides a synthetic experience for its user(s). The experience is dubbed 
"synthetic," "illusory," or "virtual" because the sensory stimulation to the 
user is simulated and generated by the "system." For all practical purposes, 
the system usually consists of various types of displays 2 for delivering the 
stimulation, sensors to detect user actions, and a computer that processes the 
user action and generates the display output. To simulate and generate 
virtual experiences, developers often build a computer model, also known 
as "virtual worlds" or "virtual environments (VE)" which are, for instance, 
spatially organized computational objects (aptly called the virtual objects), 
presented to the user through various sensory display systems such as the 
monitor, sound speakers, and force feedback devices. 

One important component of a successful VR system is the provision of 
interaction, to allow the user not just to feel a certain sensation, but also to 
change and affect the virtual world in some way. Figure 1.1 captures the 
basic architecture of a VR system and various associated terminologies. 

Goals and Applications of VR 

Of, what value is a virtual experience? Obviously, it allows people to get the 
experience of things that would otherwise be very difficult or even impossible 
to attain in real life, like going to the South Pole or to the Moon. The virtual 
experience can even be something imaginary and abstract (rather than real- 
life inspired), such as experiencing an abstract mathematical world or an 



1 The term "VR" is also used to describe the technology or medium used to create 
and convey the synthetic experience, or even sometimes to the experience itself. 

2 In VR literatures, the term "display" not only refers to the usual "visual" display 
but also to any sensory output device. 
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Figure 1 . 1 . A typical VR system exemplified by a virtual driving simulation. The user 
is driving in a virtual world as displayed to him visually through the head-mounted 
display, aurally through the earphone, and kinesthetically through the handle mech- 
anism. The displays are generated and controlled by the computer program. The 
computer also accepts user action (e.g., head movement of the user) through sensors 
and processes it to reflect it to the virtual environment and the display. A more 
abstract view of the system is shown in the lower block diagram. 

imaginary world envisioned by an artist. Thus, it goes without saying that 
virtual experiences are useful for many purposes including training, educa- 
tion, and entertainment. 

One of the sources of confusion regarding the concept of VR is whether or 
how it is different from the 3D (networked) PC/console games or similarly 
from the terms like "cyberworld," web-based chatting, etc.. Broadly speak- 
ing, 3D games are one type of VR system in the sense that they provide a 
virtual experience of some sort. However, most 3D games, for commercial 
reasons, are still keyboard-mouse-based or based on simple interfaces, and 
3D game designers are interested in finding a clever scenario that is exciting 
and engaging so that it will attract and hook more players. Although in 
developing a successful VR system, one cannot ignore these aspects, the VR 
scientists are interested more in faithfully reproducing a given experience 
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as much as possible. For instance, a first-person shooting game should be 
exciting and engaging, whereas a virtual battleground (which might have 
almost identical content as the first-person shooting game) should be scary 
and tense (as in a real war). New arcade games that now employ rich 3D 
graphics, realistic simulations, sound effects, and physical interfaces are 
more in tune with the general goals of VR. 

Two Pillars of VR: Presence and 3D Multimodal 
Interaction 

In this line of thinking, one of the important and distinguishing design goals 
of VR is the provision of Presence for creating a vivid virtual experience (and 
even for improved task performance in some cases). Presence (or the sense of 
presence) is defined as the degree to which participants subjectively feel that 
they are somewhere other than their actual physical location because of the 
effects of a computer-generated simulation [Bys99]. The effect is often 
dubbed the "sense of being there" [Hee92]. Humans process external stimuli 
provided to the visual, auditory, haptic, 3 or proprioceptive, 4 sensory system 
and transform the stimuli into an internal representation (or mental model), 
which gives humans the illusion that they are immersed in another space. It 
would be practically impossible (at least with the current technologies) to 
delude any VR user into thinking the VEs they experience are real or perfect. 
Yet with sufficient and clever integration of the sensory stimuli, VR users 
can still be elicited with a sense of presence and obtain a virtual experience 
through their ability to conform to the environment, or through the mo- 
mentary "suspension of disbelief." 

Many researchers have identified key elements that promote presence (see 
Figure 1.2 and Tablel.l). Despite a number of different definitions of 
presence, it is generally accepted that the following aspects are important 
in promoting it: (1) sensory fidelity and richness, (2) degrees of interactivity, 
and (3) other psychological cues [Shi03;ISP04]. Sensory fidelity and richness 
refer to providing a user with an environment that is as realistic as possible, 
for instance, with a wide field of view (FOV) or immersive display, pictorial 
realism, multimodal feedback, and first-person viewpoint. 

It is quite obvious that display "realism" is important for presence. But 
considering the possibility of eliciting presence in a "fake" virtual world 
(something with no counterpart in the real world such as Alice's Wonder- 
land [CarOO]), it is natural to search for other contributing factors. In this 
regard, interactivity and psychological cues can play equally important roles 
in increasing presence. Interactivity refers to the amount of involvement or 



3 The sense of touch and force. 

4 The sense of movements in the joints. 
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Figure 1.2. One of the major contributing factors to creating a vivid virtual experi- 
ence is presence which is strengthened by various factors. 



capability of the user with respect to experiencing the virtual world. Typical 
examples of interaction would include the capability to navigate throughout 
the environment, manipulate objects, change their properties, initiate object 
simulation (e.g., motions, deformations), communicate with other entities, 
and so on. An appropriate interaction design can increase presence by 
strengthening the bond or sense of belonging between the user and the 
virtual world. 

Other psychological cues associated with the design of the virtual world, 
such as predictability and consistency, use of auxiliary/background objects, 
emotional content, use of plots, and situational awareness have been 
reported to affect presence in varying ways [ISP04]. On the other hand, 
distractions from the environment, the cumbersomeness, obtrusiveness or 
novelty of the devices, and simulation sicknesses can be factors for lowering 
the level of the experience or presence. However, humans have an amazing 
capability to adapt to their environment and such negative effects tend to 
decrease with time of exposure to various such stimuli (physical or psycho- 
logical). Table 1.1 summarizes the various factors that affect presence in a 
positive or negative way. A VR system developer, perhaps unlike a 3D game 
developer, would have to consider the varying effects of these presence 
factors to create the best virtual experience often with respect to limited 
computational and hardware resources. Psychological cues are indirect cues 
in the sense that they alone do not promote presence by themselves, but 
rather act as means to cancel the negative elements such as distraction (e.g., 
when focused, one fails to notice the boundaries of the narrow display 
screen). 

The second important element in creating a successful virtual experience is 
the use of natural and usable interaction (and an appropriately designed 
interface). From intuition and from the evolutionary viewpoint, an inter- 
action scheme based on the (1) three-dimensional space, (2) involvement 
of the whole body, and (3) one that takes advantage of the multimodality of 
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Table 1.1. Possible factors in promoting or demoting the sense of presence. 



Number of sensory outputs (multimodality) 
and consistency among them 



Sensory fidelity 


How realistic each sensory output is (modality realism) 




Visual: display size/FOV, image quality, object detail and size, 




depth cues / stereoscopy, etc. 




Aural: sound quality, spatialization, etc. 




Kinesthetic: existence and realism of force feedback, extent of 




force feedback (e.g., point contact vs. areal contact), etc. 




Stimulation realism of other modality displays 




(smell, air flow, tactility, etc.) 




Viewpoint (first person) 




Simulation/behavior fidelity 


Interactivity 


Existence of interaction (Can user do anything at all?) 




Degrees of interaction (How much can user do?) 




Style of interaction (Is the interaction natural and realistic 




as in real life?) 


Psychological/ 


Characters and storyline / Emotion / Arousal 


content variables 


Willingness to suspend disbelief 


(indirect influence) 


Previous experience 




Attention and focus 


Negative factors 


Obtrusiveness of devices 


to presence 


Heavy HMD, wired and tethered sensors, etc. 




Interference from real world (e.g., noise) 



human sensory organs would be the most natural (for instance, versus the 
current keyboard-mouse-based desktop interface). It has already been men- 
tioned that interaction itself and the style of interaction can affect the user's 
sense of presence. A badly designed interface can create user discomfort and 
even simulation sickness and negatively affect the quality of the virtual 
experience. However, given the limitation in hardware and computational 
resources, designing the most usable and ergonomic interface is a challen- 
ging task that requires many cycles of trial and error. In many cases, the 
goals of naturalness, usability, and even task efficiency are often in conflict 
in modeling interaction and designing the interface. A VR system devel- 
oper must not overlook these human factors issues and pursue a user-cen- 
tered design approach through careful task analysis, leveraging existing 
design guidelines for 3D multimodal interfaces, and even one's own experi- 
mentation. 

We can think of a spectrum of "VR-ness," as a function of the level of 
presence and interaction style. Figure 1.3 shows that at one end of the 
spectrum the real world exists, which serves as the basis of an environment 
with the highest possible presence and usability (actually it is possible that 
interfaces which cannot exist in the real world may be more usable). At the 
other end exists an environment such as online text chatting with minimal 
presence and richness in terms of ways to interact. 
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Figure 1.3. The spectrum of VR-ness. 



Building a Virtual Reality System 

Developing and maintaining a VR system is a very difficult task. It requires 
in-depth knowledge in many different disciplines, such as sensing and track- 
ing technologies, stereoscopic displays, multimodal interaction and process- 
ing, computer graphics and geometric modeling, dynamics and physical 
simulation, performance tuning, and so on. The major features and require- 
ments that particularly distinguish VR systems from other software systems 
are (1) the real-time performance requirement, while maintaining an accept- 
able level of realism and presence, (2) the problem of modeling the object's 
appearance and physical properties in addition to, and in relation to, its 
function and behavior, and (3) consideration of many different styles and 
modalities of interaction techniques, according to different tasks and input/ 
output devices. The difficulty lies in the complexity of having to simultan- 
eously consider many system goals, some of which are conflicting. 

Building a VR system usually requires the following stages of effort in an 
iterative fashion. In the first stage, the requirements of the virtual experience 
are analyzed and the overall flow and scene structures are roughly sketched 
including the time and conditions for interaction. Essential input/output 
devices or the required amount of computational power should be estimated. 
Based on the requirements, the major virtual objects are modeled. The 
geometries are created (most often) using computer-aided design tools; 
then the developers program their behaviors using graphics/VR library 
routines. The virtual objects and other computational elements need to be 
organized to form a scene and the scene is programmed to be rendered and 
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displayed to the user at a reasonably high frame rate (e.g., at 20 Hz to ensure 
smooth animation of objects). Special VR sensors and display devices are 
interfaced into the system. The overall system is further refined by analyzing 
the various required interaction tasks and designing particular interfaces for 
them. Finally, various presence factors may be added to enhance the virtual 
experience as much as possible within the bounds of required performance 
(e.g., 20-Hz frame rate). The overall process may be viewed as a classic spiral 
software engineering process [Boe88] as depicted in Figure 1.4 [Seo03]. 

As with any software, VR software should be developed in stages and in 
an iterative manner. The earliest iterations should focus on the usual aspects, 
such as requirements analysis, object and feature identification, class hier- 
archy, gross system behavior, user task modeling, and general software 
architecture. Once this stage of specification matures, the next stage would 
involve more VR-related aspects, addressing performance issues and which 
computational modules identified in the first stage would be refined further. 
This would most likely include refinement of the user task models into 
interaction models, and the formation of appropriate computational and 
geometric Level of Detail (LOD) 5 models. Incremental simulation/execution 
can be applied to validate and revise important object behaviors, predict the 
approximate performance, and make further decisions for process distribu- 
tion (if appropriate tools exist). 



5 Level of detail models are geometric models of an object created at varying 
degrees of complexity for real-time rendering purposes (See Chapter 3). 
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Figure 1.4. The Spiral model of VR system development [Seo02], (Reprinted with 
permission from ACM © 2004). 
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Once the performance aspect has been treated to a certain degree, the third 
stage of iterations can address the issue of improving presence. In other 
words, the third stage decides whether to further employ nonhardware 
elements known to promote presence. Although these presence-enhancing 
elements (e.g., increasing input/output modalities, providing higher degrees 
of interactivity, increasing simulation fidelity, varying types of interaction, 
minimizing distraction, employing special effects, etc.) may possess only 
secondary importance in terms of system functionality (and may be regarded 
as targets of interest in much later stages of design), I believe that they are 
sufficiently important to be considered upstream as a defining quality in a 
VR system is the provision of presence. However, we stress that consider- 
ation of these elements is still grounded on at least some preliminary speci- 
fication of the most critical functionalities and performance requirements of 
the system. 

About This Book 

Many technical challenges lie in all facets of VR system development. This 
book is organized in such a way that it follows the development process 
depicted in Figure 1.4, and for each stage, describes the problem and 
possible solutions. It is different from, (and it is hoped, more useful than) 
other introductory books on virtual reality in that it provides concrete 
examples and practical solutions (with actual code examples) to the technical 
challenges in building a VR system instead of just explaining the high-level 
concepts, following a specific development methodology. The book is pri- 
marily written for first-level graduate students. However, advanced under- 
graduate students or IT professionals can also follow the book without 
much difficulty. 

The first part of the book covers the very basics in building a VR system in 
a systematic way and explains various technical issues in object modeling 
and scene organization. The second part of the book dives into the core (one 
of the pillars) of virtual reality dealing with 3D multimodal interaction, and 
designing for usable and natural interaction. I start with reviewing various 
special VR input and output devices. Understanding their nature is critical 
for the user-centered approach to VR system design. Then, I go over how to 
conduct an interaction task analysis and design software/hardware inter- 
faces. As modeling and animating human characters are very important, 
I devote one chapter to it as a special case of object modeling. Simulation of 
important object behaviors such as physical simulation and collision detec- 
tion/response are also treated separately. Objects, characters, behaviors, and 
scene modeling are all deeply related to the issue of providing a high sense of 
presence to the user. The companion CD includes the actual codes developed 
in stages from examples that appear in the chapters (for detailed instructions 
as to how to use them, refer to the readme files on the CD). 
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Final Notes 

Also, note that there are other advanced topics not covered in this book such 
as the implementation of camera-based interaction, the use of image-based 
approaches (to modeling and rendering), biological man-machine interfaces, 
and so on. Among those topics not treated in the book, Augmented Reality 
(AR) is a particular type of VR system, and in AR, synthetic graphics/ 
images/text are registered and overlaid onto the real scene (or on video 
imagery) using a special device called the see-through head-mounted display 
or cameras and image/graphics compositing systems. In this sense, there is 
another continuum of VR-ness, called the Mixed Reality [Mil94], in terms of 
how much of the displayed scene is real (see Figure 1.5). Multiuser virtual 
environments can add yet another dimension to the virtual experience. With 
the advent of the high-speed Internet connection, networked VR application 
is also growing in importance. 

The underlying and enabling technologies of VR can be applied to many 
other areas, not just creating a virtual experience. For instance, VR can be 
viewed as an advanced form of Human-Computer Interaction (HCI), a new 
enriched communication medium, or an intuitive information visualization 
method. Frederick Brooks of the University of North Carolina, Chapel Hill 
has long advocated that the role of virtual reality should be that of "Intel- 
ligence Amplification (LA)" (as opposed to artificial intelligence) [Rhe91]. 
He once stated that 

... I believe the use of computer systems for intelligence amplification is much more 
powerful today, and will be at any given point in the future, than the use of 
computers for artificial intelligence (AI). ... In the AI community, the objective is 
to replace the human mind by the machine and its program and its database. In the 
IA community, the objective is to build systems that amplify the human mind by 
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Figure 1.5. The Mixed Reality continuum [Mil94]. 
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providing it with computer-based auxiliaries that do the things that the mind has 
trouble doing 

I note that the type of VR system treated in this book is only representa- 
tive of larger possibilities. I leave it to the readers to think more about what 
exactly VR may, should, or can be, or what really constitutes a virtual 
experience. This book focuses more on the practical methods and technical 
basis and details of constructing a typical VR system. 
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Summary 

Virtual reality is a field of study which aims to create a system that provides 
a synthetic experience for its user(s). A VR system usually consists of various 
types of displays for delivering the stimulation, sensors to detect user ac- 
tions, and a computer that processes the user action and generates the 
display output. VR is characterized by its defining objectives to achieve 
high user-felt presence in the virtual environment and the use of natural 
3D multimodal interfaces. Through presence and natural interaction, VR 
finds uses in education, training, entertainment, and many other application 
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areas. VR systems are highly complex and require a structured approach to 
building them, starting with the basic functionalities and refining them with 
more VR features. 

Pondering Points 

• What are application possibilities for virtual reality other than education, 
training, and entertainment? 

• How can user-felt presence be beneficial to one's task performance? 

• Make a case for and against the role of realism in producing a convincing 
experience. 

• Can there be a VR system that is not mainly computer-based? 

• Is the real-life-based way of interaction the most natural and easy to use 
for humans? 

• How can preconception affect users in the virtual environment? 

• Is VR just another superfluous novelty or fad, or does it have a unique 
cost-effective benefit? 

• Can one feel "present" when reading books, playing video games, watch- 
ing movies, or through online chatting? What good can come from 
employing special modality output devices and natural interaction? 



Chapter 2 

Requirements Engineering 
and Storyboarding 



Good system engineering practice is vital to the successful development of 
VR systems, more so than ordinary software systems because VR systems 
have multifaceted requirements (not just to make correct computations). In 
fact, a typical development process for VR systems will go through many 
cycles of revisions, as there is a lack of design guidelines on how to effect- 
ively integrate various resource-consuming computations and interactive 
techniques. 

Thus, in building a VR system, we must start with identifying and de- 
scribing its requirements. Requirements [IEE94] are statements identifying a 
capability, physical characteristic, or quality factor that bounds a product or 
process need for which a solution will be pursued. Requirements refer to the 
desired properties of the system and the constraints under which it operates 
and is developed. Requirements should be documented and specified as 
clearly as possible, for ease of revision and later maintenance. Although 
requirements engineering is a difficult and cumbersome process, it should be 
done at least for the important core part of the system. These descriptions 
are best captured and maintained using computational support tools and 
formalisms, but in actuality, even hand-drawn sketches and documents 
(such as the storyboards) would be useful [Cim04]. 

Requirements may be functional or nonfunctional. Functional require- 
ments describe system services or functions. Nonfunctional requirements are 
constraints on the system or on the development process. There are many 
ways to go about doing requirements engineering for a VR system. For 
instance, we start with the functional requirements such as those about the 
scenes, virtual objects comprising the scene, behaviors, and the style of 
interaction. 

Storyboarding is one way to start off the requirements engineering process. 
A storyboard is a visual script designed to make it easier for the director and 
cameraman to "see" the shots before executing them [Cri04]. It saves time 
and money for the producer and is used for making movies, commercials, and 
animation. There are structured ways to make storyboards, but for now, 
informal sketches and annotations suffice for our purpose (See Figure 2.1). 
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Figure 2. 1 . Modeling and implementing virtual objects in an object-oriented fashion. 



The overall scenario, as represented in the simple form of sequences of 
"cuts" (or static scenes) in the storyboard, can be further refined and include 
some dynamics. One useful method is to use the Message Sequence Dia- 
grams (MSD) [DeM79], or use cases [Car98]. The MSD depicts typical 
scenarios of internal and external behaviors of a VR world in terms of 
sequences of data or control signals exchanged among objects in the system 
(See Figure 2.4). Using the MSDs, one can test the system for later model 
validation, but more importantly, it enables the developer to identify im- 
portant objects in the system. Constructing MSDs also aids in identifying the 
sequences of the messages among various objects and picturing how they 
interact with one another. In particular, external devices can be treated as an 
object for human-computer interaction. Object classes are then constructed 
by examining the identified objects and grouping them according to the 
commonality in their attributes. 

Objects, better referred to as "virtual objects," are the constituents of a 
virtual environment through which the user will obtain the virtual experi- 
ence. Although there is a natural mapping from virtual objects to the 
"objects" in the object-oriented programming paradigm, virtual objects are 
rather just a modeling concept at least at this stage. As these virtual objects 
are later implemented as "objects" in an object-oriented computational 
platform (which would be a natural thing to do), they are interchangeably 
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referred to as both a modeling concept and a specific computational imple- 
mentation. Virtual objects, for their physical connotation, indeed lend them- 
selves naturally to the object-oriented system development methodology, 
and this book chooses to illustrate the implementation details using the 
object-oriented platform. Note that the object-oriented approach can 
be used to model the concrete virtual objects and scenes they compose in 
the VE, and to abstract various functional services required to execute and 
manage them, for instance, device management, rendering control, object 
and scene creation/consolidation/importing, event management and 
communication, process management, and so forth. We use the Open- 
SceneGraph [Ope04] and the SGI Performer 1 to illustrate many of the 
concepts explained in this book (actual code samples may be found on the 
companion CD). OpenSceneGraph is an open-source high-performance 3D 
graphics toolkit written entirely in standard C++ and OpenGL. SGI Per- 
former is a popular commercial package for developing virtual reality 
applications. 

For a large-scale virtual environment with many sorts of objects, sketch- 
ing a rough object class diagram can be useful. A class diagram shows the 
existence of classes and their relationships in the logical and brief 
view format. The standard class diagram notation such as that of the 
Unified Modeling Language (UML) [Fow97] can be used. The diagram 
includes association, aggregation, composition, and inheritance relation- 
ships. Relationships provide a path for communication between objects. It 
is important to begin the overall modeling process with a consistent view of 
the object-orientation. With a clear picture of a system configuration 
in terms of constituent objects and information flows between them, 
the detailed specification behavior, function, and form for each object can 
begin. 

Virtual objects, just like physical objects, can be characterized by three 
main aspects: the form, function and behavior. Form refers to the 
outer appearance of virtual objects, and their physical properties and struc- 
ture. 2 We usually associate "appearance" with the visual sense (how it 
looks), however, a form or appearance must be judged with respect to 
ways it can stimulate humans through the display devices. Thus, form may 
include appearances also in terms of audition, haptics (force feedback), and 
other modalities that humans possess. For simplicity, we concentrate on the 
visual part for now, but later in the book, we will talk about modeling and 
simulation of nonvisual appearances. Other physical properties (which may 
be required for physical simulation) such as mass, material property, vel- 
ocity, and acceleration may be included as part of form information. 



1 Performer is a registered trademark of Silicon Graphics, Inc. A free month-long 
evaluation version of Performer is available at www.sgi.com. 

2 Structure refers to the spatial/logical relationship among component objects in 
the case where the given object is a composite one. 
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Function refers to encoding what virtual objects do (i.e., primitive tasks) to 
accomplish their behavior (defined below), whether autonomously or in 
response to some external stimuli or event, and behavior refers to how 
individual virtual objects dynamically change and carry out different func- 
tions over a (relatively long) period of time, usually expressed through states, 
exchange of data/events, and interobject constraints. It is somewhat difficult 
to clearly draw the line between function and behavior. Functions may be 
viewed as primitive behaviors that are mostly atomic and taking a relatively 
short amount of time. Separating them, nevertheless, is useful for modular 
design of object dynamics. The description of objects, as part of a formal or 
informal specification of the overall application or system, must address 
these aspects. Note that there may be objects without form (purely compu- 
tational objects such as device interfaces) or without function or dynamic 
behavior (e.g., static nonmoving objects such as virtual rocks). 

So, for instance, the form specification/description would start by captur- 
ing the initial approximate shape/volume as well as the physical configur- 
ation of those objects (e.g., a simple hand-drawn sketch will do). As the 
description gets more mature and goes through a number of refinement 
iterations, the objects could decompose into smaller components (e.g., by 
breaking a car into its components, such as body, wheels, doors, etc.). Values 
of important attributes (e.g., size, color, mass, object type, etc.) may be 
added to this description as well. These descriptions are best captured and 
maintained using computational support tools and formalisms, but in actu- 
ality, hand-drawn sketches and documents (such as the storyboards) would 
still prove useful. More detailed explanations of the modeling and initial 
implementation process are given in Chapters 3 and 4. 

Construction of virtual objects and their world often requires many 
revisions, and changing one aspect of the world will undoubtedly affect 
other aspects of it. For instance, different shapes and configurations (posi- 
tions and orientations in space) can result in different dynamic behaviors. 
A jet fighter has different aerodynamic characteristics from that of a passen- 
ger airplane. Form can also affect functionality. For instance, two different 
robots differing in size may have different work volumes and capabilities. 
Such a development cycle is difficult to handle when working in a single level 
of abstraction and considering these design spaces in isolation. 

Object functions and behaviors can equally be described using tools as 
primitive as plain text to more structured and diagrammatic representations 
such as procedural scripts, state transition diagrams, data flow diagrams, 
constraint languages, and the like. The choice of representation should be 
based on the complexity and nature of the object behavior and also on 
the type of behavior model supported by the VR development platform 
(so that the description can be easily mapped to and implemented at a 
later time). For instance, some game engines support state-based automata 
to express and implement intelligence into objects. Less fancy VR develop- 
ment platforms only support procedural programming for object behavior 
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implementations. See Chapter 4 for more details. Figure 2.1 illustrates this 
initial modeling process as demonstrated in this book. 

Another equally important functional requirement concerns user inter- 
action. The storyboard and the MSD identify the important junctions and 
events at which user input is required. The task required to be carried out by the 
user should be refined to some degree and matched with the capabilities of the 
hardware devices and computational power of the computing hardware. The 
method of interaction modeling and interface design is treated in Chapter 5. 

A related problem to interaction is the designation of the proper display 
devices. Different display systems are suited for different tasks and situ- 
ations. For instance, HMDs are more suited for close-range manipulation 
tasks, whereas large projection displays are suited for navigation and walk- 
through application. Whether to employ head-tracking, haptics, 3D sound, 
and so on is an important interaction-related decision to make. Generally, 
sensors and displays cannot be changed during their use. They are also 
generally expensive, and one might not have the luxury of choosing the 
best possible displays and sensors. A clever design of the contents can 
overcome some of the limits introduced by low-end displays and sensors. 
Thus, at an early stage, having a rough idea of the nature of the user tasks 
and interactions (e.g., style of input and response to input) is helpful in 
determining the right displays and sensors and in recognizing the limits and 
bounds introduced by the hardware for providing a suitable level of presence 
and usability. Also note that there may be interaction objects (those that are 
purely functional such as device polling, or those also with form such as 
menus) to consider as well. Putting the user in the center of the system design 
process is very important as many VR systems fail simply because they are 
not user friendly. 

The important nonfunctional requirements to consider at this stage are 
requirements for the overall system performance and device constraints. The 
performance requirement is rather simple. A virtual reality system is a real- 
time system, and must make computations for simulations, synchronize its 
output with various input devices, and maintain display updates at a rate at 
which human users will feel comfortable. For instance, for smooth computer 
graphic animations, the simulation for updates should be made at about at 
least 15 ~ 20 times per second. Other input or display devices may require 
different timing requirements (for instance, haptic equipments ideally re- 
quire an update rate of up to 1000 Hz for delivery of smooth force feed- 
back). Note that 1/1 5th second is an amount relative to the capability of the 
computational and graphics hardware. Thus, if the functional requirement 
cannot be accommodated by the nonfunctional constraints such as the 
performance bounds or the devices, they have to be addressed in some 
way, either by making a business decision to purchase the appropriate 
equipment or later by designing to overcome the resulting distraction factors 
through clever content psychology. The important thing is that this be 
known in the early design stage. 
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Finally, a developer needs to understand, once again, that making these 
requirements and implementing them is an iterative process, starting from a 
rough picture and being refined stage by stage. To what degree should the 
requirements and implementation be done? That depends on the discretion 
of the developer. 

Example: Ship Simulator Design 

We illustrate this initial modeling process more concretely by illustrating 
the design of a simple virtual ship simulator. The objective of the example 
application is to assist trainees to navigate in and out of the pier and anchor 
without colliding with other vessels or the coast. Figure 2.2 lists the initial 
requirements for the simulator. Given these high-level goals and informal 
requirements of the system, we start with sketches of the storyboards as 
shown in Figure 2.3. 

Requirements (Level 1) 

• The virtual ship simulator (named Ship Simulator) helps users (named User) 
operate a vessel (named My Ship) and practice docking without colliding with 
other vessels (named Other Ship) or the coast. 

• Initial View 

- The default view (named Camera) is the scene as seen from the control 
bridge where the User controls its ship (MyShip). The User can see the 
outside environment through the windows in the bridge. 

• Interaction 

- The control bridge includes a steering wheel (named Steering Wheel) and 
an engine lever (named EngineTelegraph) for the User to steer and control 
the velocity of the My Ship. 

- The User can look around the interior of the bridge and change its view 
named Camera). 

- The basic mode of control via keyboard (named Keyboard) and mouse 
(named Mouse) must be supported. Ship Simulator shall accept input from 
the Keyboard to control My Ship. 

• Models 

- The bridge includes a steering wheel (named Steering Wheel) and an engine 
lever (named Engine Telegraph). 

- The scene must also include object models for sky, sea, other ship, terrain, 
and pier. 

• Simulation 

- Ship Simulator controls several automatically navigated vessels 
(Other Ship). 

- Othership's initial positions and moving directions are chosen randomly. 

- Otherships change their speed and directions every 1 0 seconds. 

Figure 2.2. The initial requirements for the virtual ship simulator. 
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Figure 2.3(a). The default starting view of the ShipSimulator. The interior of the 
control bridge is seen with the steering wheel, engine lever, outside view, and gauges. 
The User can look around the control bridge. 




Figure 2.3(b). The external view of Figure 2.3a. A number of ships (including 
MyShip) move around the sea. This view can be selected by separate keyboard/ 
mouse control. 
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Figure 2.3(d). The external view of Figure 2.3c. 
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Figure 2.3(f). The external view of Figure 2.3e. 
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As shown in this simple storyboard, the three major objects are identified 
first: the trainee vessel (called MyShip), other automatically controlled 
vessels (called OtherShip), and the central simulation control module (called 
Ship Simulator). MyShip is composed of, among other things, SteeringWheel 
and EngineTelegraph (the user interface for vessel control). We also identify 
an interface object: the Keyboard (for various ship and training control 
functions) and an object representing the camera position, Camera. 

The specification starts by creating simple scenarios using the MSD as 
depicted in Figure 2.4. Figure 2.4a is the first simple example of the MSD, a 
trainee interaction scenario for "looking around" on the control bridge. 
When the User enters a key, it is stored by the interface object Keyboard, 
and the User checks what kind of keys were pressed (e.g., "z" for looking to 
the left), and the Camera is updated accordingly. A similar interaction 
scenario is given in Figures 2.4b and c where the User communicates to 
the Keyboard (pressing the up/down/left/right arrow keys) to control the 
speed and the course of MyShip. In Figure 2.4d, the OtherShip sets its own 
initial position and direction in a random fashion and changes its speed and 
direction periodically every 10 second. 

An initial class definition (with major functionalities specified based on 
the content of the messages exchanged) and the class diagram is designed 
as depicted in Figure 2.5. Figure 2.5 shows the simplified class diagram 
created by constructing various MSDs. Notice that the interaction object 
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Figure 2.4(a). MSD for simple keyboard-based view control. 
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Figure 2. 4(b), (c). MSD for controlling My Ship's velocity and direction using the 
arrow keys. 
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Figure 2.4(d). MSD for initializing and updating an instance of an OtherShip. 



Keyboard and the ShipSimulator are purely "functional" without any form. 
As noted, the relations between classes are clarified at this stage of the 
modeling. A trainee can operate MyShip through Keyboard, then MyShip 
changes Camera. He or she can also change the orientation of Camera 
through Keyboard but the change in Camera does not affect MyShip. This 
initial class diagram will be subject to revision during the next phases of 
development. 
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Figure 2.5. An initial class definition for the ship simulator. 
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Summary 

VR system design starts with listing the requirements and carefully analyzing 
them as to whether virtual reality is even needed in the first place. The 
requirements must be centered around the user's expectation and capabil- 
ities. For instance, an experience-oriented requirements will result in a 
system with emphasis on presence, whereas a task-oriented requirements 
will place emphasis on efficient interaction. Based on the requirements, the 
overall scenario can be constructed using storyboards. "Virtual" objects that 
make up the scene are identified and the basic specifications for their form, 
function, and behavior should be made. Other aspects of the system such as 
device constraints, interaction, major special effects, and presence cues are 
also noted in this early stage of system development. Major interobject 
relationships are made more explicit by drawing class diagrams and message 
sequence diagrams. 

Pondering Points 

• Characterize the form, function, and behavior for a virtual human, virtual 
rock, virtual airplane, and virtual wind. 

• What are possible barriers to making a VR system run in real-time? 

• Make a case for, and against, carrying out requirements engineering 
at all. 

• Make a case for, and against, using abstract formalism, support tools, 
or even documentation for requirements and system specifications. 

• Is the object-oriented paradigm most fitting for implementing VR 

systems? 

• Can having too many interaction points in the VR content be detrimental 
to inducing a good convincing virtual experience? 

• In achieving the intended level of virtual experience, how can one make a 
good decision, for instance, between purchasing a special device for the 
increased effect, and staying with the less capable one and overcoming its 
shortcoming using other tricks? 



Chapter 3 

Object and Scene Modeling 



Object Modeling and Initial Implementation 

Virtual objects, for their natural physical connotation, indeed find them- 
selves naturally mapped to the objects in the object-oriented system devel- 
opment methodology. Note that the object-oriented approach not only can 
be used to model the objects and scenes in the VE, but also abstract the 
various computational services needed for the VE, for instance, device 
management, rendering control, object importing, event management, pro- 
cess management, and so forth. In this chapter, we look at different ways 
to implement the form and function/behavior of the virtual objects based 
on the rough requirements and specification exemplified in the previous 
chapter. 

Geometric (Form) Modeling/Implementation 

Most VR applications are usually developed in the sequence of form mod- 
eling, followed by function/behavior programming. The most popular 
method of form modeling is to use Computer-Aided Design (CAD) systems. 
Then, if necessary, the CAD output must be converted into the appropriate 
file format or data structure to be used by the VR execution environment 
(e.g., for rendering and displaying it). The VR development/execution en- 
vironment is a computational layer built on top of an underlying basic 
system support (such as for graphics, device interfaces, and system control) 
that provides abstraction for developers to program (through a set of library 
routines) and run their programs. Figure 3.1 (left) shows the structure of this 
computational structure. 

Figure 3.1 (right) also shows the structure of a CAD system which is in 
fact another computer graphic application. It allows users to enter com- 
mands through the keyboard and the mouse, and visually construct the 
appearance/geometry of an object. The geometry and other properties of 
the object in construction all have internal computational representations 
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Figure 3.1. A VR execution environment and importing a CAD model for a virtual 
object. 



(or data structures). For these objects to be "used" and drawn for your VR 
system, they need to be put in a certain file format and then be imported into 
your run-time VR environment. Whatever file formats or internal representa- 
tions that different modelers and computer graphics systems use, they ultim- 
ately must be converted into a list of (flat) polygons/triangles, because that is 
what today's graphics hardware uses to produce images on the screen. There 
exist popular CAD systems, and there exist popular VR run-time environ- 
ments, and they are not necessarily compatible with one another. For instance, 
a popular geometric modeler 3DSMax' produces an output file in a format 
called the ".obj" files, and certain VR development/execution environments 
will not be able to import the file format and convert it into the list of polygons 
to render them on the screen. One of the first things that the VR developer must 
worry about is this compatibility: whether a given CAD model can be 
imported into the chosen VR development/execution environment. Certain 
VR packages advertise that they are able to import various kinds of file 
formats, but it is always safe to doublecheck for "full" compatibility. 

CAD data not only include the geometry information, but also much 
other useful information such as the hierarchical structure of the object, 
color, texture, lighting properties, material, and even simple behavior or 
animation sequences. Certain environments will only be able to import 



1 3DSMax is a registered trademark of the Discreet, Inc. 
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part of this information. Note that the imported information is merged into 
yet another internal representation that a given particular VR environment 
will use. Even though the job of object modeling is often committed to the 
graphic designers (i.e., expert CAD users), it is beneficial for a VR developer 
to be familiar with the modeling process and the internal model representa- 
tions used by the CAD system, because the way objects are modeled and 
represented internally can later have an impact on the performance of the 
virtual reality system that uses these object models. 



Various Representations for Geometry 

In this line of reasoning, we discuss the basis for representing various types 
of geometry. One of the simplest ways to represent geometry is through 
mathematics such as algebraic equations. Figure 3.2 shows a few varieties. 
Representing such mathematical objects can be divided into several flavors: 
the parametric, explicit, and implicit forms (see Figure 3.3). 

Thus, for instance, an internal representation for an "explicit" line would 
have an identifier for the object and the entity type (i.e., it is a line) and the 
values for the coefficients of the line equation. Or, a "parametric" curve/ 
surface can be specified in a similar fashion by providing the control points 
(by mouse clicks). Certain computations are made easier depending on the 
type of mathematical representation used. For instance, although the implicit 
representation can be used conveniently for collision detection (as illustrated 
in Figure 3.4), the parametric representation is more convenient for figuring 
out "where" something (such as a point) is on a given surface or a curve. 




Figure 3.2. Examples of algebraic representations of various geometric entities. 
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Figure 3.3. Variations in mathematical representations: parametric, implicit, and 
explicit forms. 



Although these mathematical equations can be useful, it would be quite 
difficult to model a certain object entirely mathematically (e.g., what is the 
mathematical representation for a chair?). To construct more complex-look- 
ing objects (such as a chair), one good way might be to use the mathematical 
representation for the primitives, but put the primitives together "manu- 
ally." One such approach is Constructive Solid Geometry (CSG) (see Figure 
3.5). In CSG-based CAD systems, an intuitive set of operators such as 




X= /-COS^COS0,-W2<0<s:/2 
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Figure 3.4. Different mathematical representations used for collision detection and 
point localization. 
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Figure 3.5. The modeling concept of Constructive Solid Geometry (CSG). 

union, intersection, and difference can be applied to compose and create 
complex-looking objects. The mathematics behind this (for computing the 
resultant shape) is very complicated and beyond the scope of this book. 
Another approach is to use constraints. Constraints are helpful, for instance, 
in positioning one object in relation to another in a precise way. For 
instance, a constraint that says, "face X needs to align with edge Y and 
edge Z" can be input prior to making the manual placement, and this way, 
the degrees of freedom (needed to place the face along the edge Y and Z) can 
be limited in order to guide the placement in the most convenient way. After 
the primitives are put together, the overall geometric configuration can be 
captured simply by recording the relative position and orientation (e.g., 4 x 
4 transformation matrices; see later part of this chapter) between the con- 
stituent primitives. A virtual object composed of several constituent primi- 
tives (or subobjects) can be represented hierarchically using a tree data 
structure. In this hierarchy, the coordinate system of children objects (con- 
stituents of the parent object) is defined relative to that of the parent object. 
When a motion (or certain transformation) is applied to the parent, all of its 
children are affected by it as well. 

Another simple way to model objects is through manually constructing an 
object using even more primitive entities such as polygons, triangles, edges, 
and vertices. Note that (flat) polygonal representation is what is required 
by the graphics hardware for rendering. Because a polygon is made up 
of vertices and edges, a polygonal-based modeler would allow users to 
place vertices and connect them to form edges and faces, in the virtual 3D 
space, to construct a polygonal object model (see Figure 3.6). Such a model 
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Figure 3.6. Polygons constructed by placing vertices in 3D space and specifying their 
connectivity. Note that their surfaces may not be flat. Thus, they are decomposed 
into triangles by subdividing the polygons by a triangularization process. Any 
polygon can be split into a number of triangles. 

is also called the B-Rep (Boundary Representation) or wire-frame because 
the exact surface is not really mathematically defined (only the boundaries 
are). However, we cannot assume that the polygons constructed by specify- 
ing a number of vertices and their connectivity will be flat. This is because a 
polygon with more than four vertices is not guaranteed to lie on a flat plane 
(however, triangles are). Thus, polygons are often ultimately further decom- 
posed into triangles by a separate process (i.e., triangularization 2 ). 

These can be simply displayed with the content being "hollow" (surfaces 
not filled in). Naturally, it will be difficult to model complex shapes (espe- 
cially those with fancy curves and surfaces) by means of this method. Even 
smooth curves or surfaces can be approximated with polygon models given a 
sufficient amount of vertices, edges, and faces (see Figure 3.7). However, this 
is problematic, because it would be too tedious for the user to input the 
positions of these vertices and their connectivity information. Perhaps the 
best approach is offer the CAD user an intuitive and natural interface to 
indirectly create basic primitives in a parametric way and compose them 
using a method such as the CSG and constraint-based methods. Then, 
automatic polygonalization and triangularization algorithms can be called 
to produce the polygon/triangle list data structure (and the file format to be 
exported). 

One shortcoming of using the automatic polygonalization (or triangular- 
ization) method is the lack of flexibility in controlling the number of poly- 
gons they produce for a given model. The performance of a graphics 
subsystem is quite dependent on the number of polygons of the objects to 
be drawn. Some CAD systems do offer advanced polygonalization (or 
triangularization) routines in which the output can be specified with the 
number of target polygon counts, and even specification of different parts 
for different degrees of polygonalization (or triangularization). Also note 
that a polygon list can be easily converted into a triangle list (because any 
polygon can be split into a number of triangles). 



2 Triangulation is also often called tessellation. 
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Figure 3.7. Modeling an object using a CAD system with various representations, 
and converting them into a polygonal data/file. 

The final method of modeling is the procedural modeling method. As its 
name implies, one can create and specify the geometry of the model, relative 
locations and orientations of subobjects, their color, and other properties in 
a procedural manner, that is, by writing a program that consists of com- 
mands that accomplish the job (see Figure 3.8). Library routines from 
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Making a cube 



pfGeoSet *sphereGset; 
pfGeode *geode; 



pfGeoSet "cubeGset; 
pfGeode *geode; 



//make unit sphere at (0,0,0) 
//using 100 triangular faces 
sphereGset= pfdNewSphere(100, arena); 
geode->addGSet(sphereGset); 



//make unit cube at (0,0,0) 
cubeGset= pfdNewcube(arena); 
geode->addGSet(cubeGset); 



//parent node in scene graph 

scene Graph Root->addChild(geode); 



//parent node in scene graph 
sceneGraphRoot->addChild(geode); 



//set radius of sphere 
sceneGraphRoot->setScale(20); 



//set xyz length of cube 
sceneGraphRoot->setScale(20,20,30); 



//set sphere origin position 
sceneGraphRoot->setTrans(1 0,20,30); 



//set cube position 

sceneGraphRoot->setTrans(1 0,20,30); 



Figure 3.8. Procedural routines for creating and placing primitive objects such as a 
sphere and a cube using the Performer programming library. 3 



3 Performer is a registered trademark of Silicon Graphics, Inc. 
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computer graphics or VR engines usually include a small set of routines for 
creating simple primitives (in a parametric way) such as cubes, spheres, 
cones, ellipses, and the like. Again, it would be quite difficult to represent 
a complex shape using this method. However, it is sometimes convenient to 
use these code-generated parametric primitives to prescribe changes to the 
model (e.g., shape change, replacing textures or colors) during run-time. 

Remember that there can be information relevant to "form" other than 
just geometry. Nongeometric form information can include the usual object 
hierarchy, color, material, and physical properties such as mass, position, 
velocity, acceleration, force, angular velocity, angular acceleration, torque, 
and so on. Developers can define additional properties as needed. The object 
modeler (e.g., CAD system) may allow encoding of such information in 
addition to the geometry. If not, the user must document it separately 
manually or using other tools. 

An external data importer of a given VR development/execution should 
interpret the form information created by the object modeling and merge it 
into its own object model. As mentioned before, even though many types 
of information may be specified using a given modeling tool, when imported 
into the VR development/execution platform, some of the information 
can be lost (see Figure 3.9). For instance, using a modeling tool like 
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Figure 3.9. Illustration of the possible loss of information during the model file 
import process. 
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3DSMax, one can create not only nice-looking geometries for an object, but 
also its structure, simple behavior, and animation sequences. However, when 
importing the ".obj" file into a certain VR platform, not all information may 
survive in the conversion process. One of the most critical pieces of "form" 
information is the object hierarchy (i.e., subobjects and their relationships). 
In 3DSMax, the object may have consisted of subobjects in a hierarchical 
manner, with each subobject having an independent status with a separate 
polygon list. However, a given VR platform, in reading and importing the 
".obj" file may reconstruct the object as one monolithic polygon list for the 
whole object. Thus, for instance, one will not know where the right leg of a 
human model is (or which polygons correspond to those for the leg) because 
the resultant model is just a big polygon list. A more desired conversion 
process would be one that can preserve or reconstruct the data as a hier- 
archical tree of polygon lists, the same as the input. 

Performance-Conscious Form Modeling 

Most approaches to dealing with the real-time performance requirements of 
VR systems have focused on reducing the number of objects/polygons that 
need to be processed by the graphics hardware. Image-based rendering is an 
extreme example of this approach, which essentially eliminates any use of 3D 
polygons. Instead it uses and requires computations on images. The images 
can be for individual objects or for a good part of the whole environment. 
Creative use of textures or image-based modeling is a good performance 
optimization technique that not only provides reasonable realism, but also 
relieves users of the burden of tedious modeling efforts (see Chapter 5). 
However, its full-blown usage for VR is yet an open question due to some 
unresolved issues, such as the low interactivity, visibility problems, storage 
requirements, and the intensive image processing required for correct view 
generation. 

Another approach to reducing the graphics overhead is to figure out, at a 
given frame to be drawn, the part of the virtual world visible from the given 
viewpoint and only render that. These techniques usually rely on a particular 
model structure (e.g., indoor building) or designation of occluding and their 
occludee objects or polygons. However, these approaches require assump- 
tions that may not be applicable to the construction of general VR worlds. 

One of the most popular ways to tune a VR system performancewise is to 
use the levels of detail (LOD). An object may be associated with, not one, 
but multiple geometric models of varying details, known as the LODs, and 
the system can dynamically switch among them to maintain an acceptable 
frame rate depending on system load and the importance of the object. The 
conventional approach to preparing for (geometric) LODs for virtual ob- 
jects is through a process called polygon budgeting [Hof97]. With polygon 
budgeting, important (regarding their potential rendering cost) virtual 
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objects in the scene are identified and assigned the appropriate number of 
polygons, depending on the scene complexity and the limitations of the 
target graphic hardware. The budget is used by the model builders as a 
guide as they create the geometric models using the geometric modelers and 
CAD systems. In practice, models are usually created with utmost detail, 
then, simplified in reverse (using mesh simplification algorithms 4 ) down to 
several levels of detail as needed. 

One of the standing problems with simplification algorithms is the preser- 
vation of geometric features (e.g., topology, curvature, vertex position, etc.). 
There is an inherent trade-off between degrees of simplification and preserva- 
tion of the original appearance, features, and shape. Moreover, these algo- 
rithms (as implemented in the geometric modelers) are applied to the overall 
model, and do not usually allow the user to "selectively" simplify or preserve 
certain segments or features that may be visually significant. Ideally, with the 
polygon budget, the geometries must be created and refined from a rough 
model to a detailed one as part of a hierarchical and incremental development 
process. Such a process promotes a performance-conscious design by forcing 
the developer to focus on the more critical features in form, function, and/or 
behavior in a top-down manner. Intermediate models obtained from the 
hierarchical modeling approach can naturally be used for LODs. 

Scene Construction 

Once the object's forms and behaviors (see next sections for behavior mod- 
eling) are roughly modeled, they need to be "put in place" in the 3D space to 
compose a scene (at least an initial one). Note that the scene can be changed 
in time as objects may move, are newly introduced, or destroyed. In order to 
place an object in a scene, there has to be a fixed standard reference 
coordinate system. Any graphics or VR platform assumes the existence of 
such a coordinate system, usually called the "World" coordinate system. 
The locations and orientations of all the virtual objects need to be set (or 
converted to) with respect to this assumed World coordinate system, because 
the graphics hardware, for instance, assumes that all object coordinates are 
with respect to that of the World, and projects them to the screen. 

Object Placement by Series of Action 

In order to specify a position and orientation of an object, the object itself 
must possess a coordinate system on its own, called the object or local 
coordinate system. The local coordinate system is (most usually) attached 



4 Most popular geometric modelers include a mesh simplification functionality. 
A good survey of simplification algorithms is given in [Lub99], 



Object Placement by Series of Action 



37 



somewhere on the object itself and in a "convenient" orientation at model- 
ing time by the user. In a sense, this local coordinate system represents the 
object (thus, it makes sense that the local coordinate system be placed in a 
"strategic" position on the object). For instance, a virtual desk object might 
have a local coordinate system placed in the corner of its flat top (so the 
origin of the object coordinate system would be here) with its x-axis aligned 
to the length, j-axis aligned to the depth, and z-axis pointing perpendicu- 
lar to the flat top (see Figure 3.10). However, it does not matter where and 
how the location and orientation of the object coordinate are placed. 

The position and orientation of the object "in the world" are defined by 
where and how the "object's coordinate system" is placed in relation to the 
"world coordinate system." This analogy can be extended to an object-to- 
object relationship. That is, the position and orientation of one object with 
respect to another object can be defined by where and how one object's 
coordinate system is placed in relation to the other object's coordinate system. 
In such a case, two coordinate systems form a parent (reference)-child rela- 
tionship. Cascaded parent-child relationships among objects (or equivalently 
their coordinate systems) can be formed, with the ultimate parent being the 
"World" coordinate system (we come back to this issue later). 

Given an object coordinate system and its reference or parent (e.g., 
World) coordinate system, the object's location can be specified by stating 
how to place the object coordinate system origin at the desired location with 
respect to the reference coordinate system. Initially, the object (or equiva- 
lently its local coordinate system) is assumed to be at, and aligned with, its 
parent coordinate system. Thus, as shown in Figure 3.10, the location of 
object O by default would be at coordinate (0, 0, 0), then respecified to be at 
coordinate (-10, 30, 20) by simply adding the displacement of (-10, 30, 20), 
or equivalently moving O by —10 in the x-direction, 30 in the j-direction, 
and 20 in the z-direction of the World coordinate system. 




Figure 3.10. The World coordinate system and the object/local coordinate system for 
the desk. 
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Note that although this new position is (-10, 30, 20) with respect to the 
parent coordinate system, it is also at (0, 0, 0) with respect to the local object 
coordinate system. To avoid confusion, we attach a superscript (such as R 
for the "Reference," or O, "Object") to the coordinates to indicate their 
reference coordinate system. Thus, after the movement, coordinates 
°(0, 0, 0) and R (— 10, 30, 20) refer to the same point, the (new) location 
of the object. If the object needs to be moved again, for instance by (10, 10, 
10), then the displacement is simply added to the current location, producing 
R (0, 40, 30). 

By the same logic, the object would always be located at the origin of its 
own local coordinate system, °(0, 0, 0). What about the location of a 
"local" point (or vertex) °(px, py, pz) on this object? Assuming that the 
object does not change its shape during the movement or placement oper- 
ation (i.e., rigid motion without deformation), its local coordinates will 
remain the same. However, coordinates of the local point with respect to 
the parent World coordinate system will change by the amount of the 
movement. That is, °(px, py, pz) and R (px — 10, py + 30, pz + 20) refer to 
a same point in space. 

The operation is represented as 



Note that the expression operates with respect to one coordinate system. 
Thus, it is safe to drop the superscript R. VR programming libraries include 
variations of the movement operator shown above. For instance, a hypo- 
thetical call such as Move ( object 1, 10, 20, 30) will carry out the operation 
above, adding the displacement to the current location of object 1 with 
respect to its parent coordinate system. Another hypothetical call such as 
Absolute jnove (objectl, 10, 20, 30) would place objectl at location (10, 20, 
30) with respect to the origin of the parent coordinate system of objectl. 

In addition to the position, the orientation of an object also needs to be 
specified. One method of doing this is by describing by how much to rotate 
its own (local) coordinate system around the axis of the parent (or reference) 
coordinate system. For instance, we might rotate 5 the object (or equivalently 



R 



^New = Trans( X G \d, displacement) 



/ dx\ 
= R X 0 \d + dy 
\dz) 




5 All rotations are according to the right-hand rule (i.e., the right thumb directed 
toward the positive direction of the axis, and the direction of the remaining fingers 
denoting counterclockwise direction of rotation. 
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Figure 3.11. The order of applying rotation matters. 



its local coordinate system) through the x-axis of the parent reference 
coordinate system (which is not moving) by some value a, then rotate 
through the j-axis by |3, and finally rotate through the z-axis by 7. If we 
applied the rotation by each axis in a different order, the object would be 
oriented differently, as depicted in Figure 3.11. This stems from the fact that 
mathematically applying rotation transformation is not commutative, that 
is, the order of the application matters. 

Thus, to conveniently specify the orienting operation without confusion, 
one often uses the convention called the Fixed axis angles. Fixed axis angles 
are a set of three angles, a, (3, and 7. When these three values are given as an 
orientation specification between two coordinate systems, one reference and 
the other rotated, it is understood that the object is rotated by a in the x-axis 
of the reference coordinate system, then by (3 in the j-axis, and then by 7 in 
the z-axis, "in that order." Similarly to the translation, the rotation process 
can be modeled as applying a mathematical operator (see below). 



l New 



Fixed Axis_Rot( R X 0 \d, 7? axis_set, a, fj, 7) 



where ^axis_set denotes the three axes of the (reference) coordinate system R 
(sometimes, the axis order might be explicitly specified). 

Similarly to object moving, when initially specifying the orientation of an 
object, it is assumed to be aligned with its reference coordinate system. Thus, as 
shown in Figure 3.12, the pose of object Ol by default would be aligned to the 
axis of the reference coordinate system. The actions of three rotations are 
applied to newly orient the object. The Fixed axis rotation operation can be 
mathematically represented as applying three linear transformation matrices 
in series, to any point of the object expressed in its own local coordinate system. 
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Rotating by Fixed axis convention, a=30, (5=40, y=60 
Figure 3.12. Orientation specification with Fixed axis angles. 



^New = FixedAxis ^Ro^Aoid, ^axis_set, a, p, 7) 
= Rot( R z, y)*Rot( R y, ^fRot^x, a)* R X 0 i d 



^New 



R Rot ¥[xed axis * ^IbidCin a single matrix format) 
Rot Flxed axis * Xoid 



where Rot (i, w) is a linear function representing an operation to rotate 
around the axis by w degrees. Note that the expression operates with 
respect to one coordinate system (in this case the R). Thus, it is safe to 
drop the superscript R. Rot(z, y), Rot(y, /3), Rot(x, a) are given by the 
following simple formulas (in matrix forms). 



Rot{ R z, 7) 



Rot( R y, (3) 



Rot( R x, a) 



cos y 
sin y 
0 

cos/3 
0 



— sin y 
cos y 
0 



0 sin (3 

1 0 
- sin (3 0 cos f3 
1 0 0 

0 cos a — sin a 
0 



ana 



cos a 



Fixed axis angle convention is needed when all three orthogonal axes are 
involved in sequence in specifying the relative orientation between two 
coordinate systems. However, rotation around an individual axis (whether 
the axis is one of the reference coordinate axes or an arbitrary vector) can be 
specified with the axis vector and the amount of rotation (e.g., degrees) as 
one mathematical operator. Expressing the rotation axis as a vector 
k = R {kx, ky, kz) in the reference coordinate and the amount of rotation 9, 
the rotation is represented by a mathematical operator: 
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^^New = R Rot_by_single_axis( R k, Q)* R X~o\d, 

^New = Rot_by_single_axis(k, 9)*Xoid, 

Without confusion, the superscript R can be dropped. The formula for 
Rot by single axis is given by (in matrix form) [Cra86] 



Rot_by_single_axis (k, 0) = 



kxkxv6 + c6 kxkyv6-kzs6 kxkzv6 + kys6' 
kxkyvO + kzsO kykyvO + c6 kykzvO — kxsO 
_kxkzv8 — kys9 kykzvO + kxs8 kzkzvO + cO 



where cO = cos 0, s6 = sin 0, and vO = 1 — cos 8. Note that to replace the 
rotated object in its original position, the inverse of the rotation operator is 
applied: 



or, 



( R Rot Fixed axi T ! = Rot( R x, - a)*Rot( R y, - $)*Rot( R z, - y) 



(Rot_by_single_axis(k, 9) ) 1 — Rot_by_single_axis(k, — 9) 



There is another rotation operator called the Euler rotation operator. In the 
Euler rotation, the rotation is applied with respect to the "rotating" local 
coordinate system of the object. For instance, one can rotate around the 
local z axis (which is initially coincident with the z axis of the reference 
coordinate system) by 7, then around the local y axis (which by now would 
have rotated from its original pose and be different from the fixed reference y 
axis) by |3, then finally around the local x axis by a. The operation can be 
represented as below (note that to be clear the axis order is explicitly 
specified as a subscript). It so happens that the Euler rotation operation is 
equivalent to the Fixed axis rotation when the axis rotation order is reversed 
as shown in the formula below (derivation is not shown). 

R X New = Rot Euler ZYX ( R Xoi d R axis_set,a,^) 

= Rot Fixed axis XYZ { R X 0 id, R axis_set,a#, 1 ) 

Similarly to the translation case, VR programming libraries contain vari- 
ations of the rotation operators shown above. A hypothetical operator such 
as Fixed_Axis_Rotate (objectl, a, /3, y) would rotate object! with respect to 
its parent coordinate system by the convention (with axis order of xyz) by 
a, (3, and y (and similarly for EulerZYX Rotate (objectl, a, f3, y)). Another 
hypothetical operator such as Rotate _axis (objectl, k, d) would rotate 
objectl through vector k by angle d. 

To summarize, one way to specify the location and orientation of an 
object with respect to another reference coordinate system is to express 
them in a series of operators (and associated parameter values) applied to 
the object with respect to the reference. Note that the order of application is 
important. Motion usually involves both translation and rotation, and in 
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Object Moving 

pfGeoSet *cubeGset; 
pfGeode *geode; 
pfDCS *objectDCS; 

//make unit cube at (0,0,0) 
cubeGset = pfdNewcube (arena); 
geode->addGSet (cubeGset); 



Object Rotating 

pfGeoSet "cubeGset; 
pfGeode *geode; 
pfDCS *objectDCS; 

//make unit cube at (0,0,0) 
cubeGset= pfdNewcube (arena); 
geode->addGSet (cubeGset); 



//attach cube to parent node in scene graph 
objectDCS->addChild (geode); 

//moving cube position x to 1 0, y to 20, z to 30 
objectDCS->setTrans(1 0,20,30); 



//attach cube to parent//node in scene graph 
objectDCS->addChild(geode); 

//Rotate cube//Euler angle = (90,0,45) 
objectDCS->setRot (90,0,45); 



Figure 3.13. The SGI Performer library calls (and required parameters) for position- 
ing (or equivalently moving) and orienting (or equivalently rotating) an object. 



applying rotations and translations, the resultant position or orientation will 
be different depending on whether the rotation is applied first or the trans- 
lation is applied first (we come back to this issue again). Figure 3.13 shows 
example library calls available in SGI Performer and the required param- 
eters for rotating or translating an object. 



Dealing with Multiple Frames of Reference 

So far, even though we talked a little bit about different types of coordinate 
systems (such as reference and local), we have really operated within one 
coordinate system, moving or rotating an element (e.g., vector, point, etc.) 
expressed in one coordinate system to a new location in the same coordinate 
system. Although it is most convenient, during motion simulation (which is 
the specification of position or orientation in time) or rendering (part of 
which is projecting objects to a display screen), that all involved objects 
be expressed in one frame of reference, virtual objects, in general, are not 
expressed in just one (for instance, the World) coordinate system. 

In fact, all the coordinates of the primitives (e.g., polygons, faces, edges, 
vertices, etc.) that make up an object are (initially) expressed with respect to 
the object coordinate system. For instance, a coordinate of a vertex (1, 1, 1) 
that belongs to objectl will be with respect to objectl's coordinate system, 
whereas another vertex (1, 1, 1) that belongs to object2 will be with respect to 
object2's coordinate system. Thus, although they both have the representa- 
tion (in fact, a confusing one) of (1, 1, 1), they refer to different positions in 
the (whole) space. As already mentioned, thus, when specifying a coordinate, 
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Figure 3.14. Hierarchy (or chain) of relations among objects (and their coordinate 
system). 



to avoid confusion, one must express the coordinate system to which it 
belongs (or the reference coordinate system). 

Similarly, a position and orientation of an object with its own local 
coordinate system may be specified with respect to another object coordinate 
system (we have learned how to do this in the previous section). A typical 
case is when an object contains subobjects (as in a desk having four legs as 
distinct subobjects; see Figure 3.14). The subobject locations and orienta- 
tions are conveniently specified with respect to the parent object (the parent 
object becomes the reference instead of the World). For instance, it is more 
natural to specify the leg of the desk as being located at the corner of the flat 
top, that is, in reference to the flat top object, rather than in reference to 
some other unrelated entity. Thus, a cascade (or hierarchy) of interobject (or 
intercoordinate) relations can be formed with the World coordinate system 
being the final, root, and absolute reference coordinate system among ob- 
jects in a scene as shown in Figure 3.14. In essence, this hierarchy expresses 
the whole scene, the constituent objects, and their location and orientations. 
Thus, what we need is a mechanism to express locations and orientations of 
all objects expressed with respect to different coordinate systems ultimately 
into a single unified one (e.g., the World) so that one can easily carry out 
certain computations for display or simulation. 



Re-Expressing Coordinates 

We have seen that rotation operators can be expressed in a matrix format, 
and rotation matrices have a nice property that a product of rotation 
matrices results in another (one) rotation matrix (i.e., a series of rotations 
around respective axes is equivalent to one rotation). In fact, a rotation 
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matrix between two 3D coordinate systems is defined as a matrix composed 
of three (unit) column vectors, each representing the axis of one coordinate 
system with respect to the other coordinate system. This rotation matrix is 
equivalent to the matrix that is formed by multiplying the series of rotations 
(in order) to make one coordinate system coincide with the other (as we have 
explained so far). By the theories of linear algebra, a rotation matrix carries 
out a change of basis. That is, multiplying a rotation matrix by a coordinate 
expressed in one coordinate system changes the coordinate to be expressed in 
another coordinate system (see Figure 3.15). 

Recall, for instance, Rot Euler , rotates an object originally at R X 0 id into a 
new location, R X^ ew . Thus, there are two interpretations for what the 
rotation matrix can do, as illustrated in Figure 3.15. One is re-expressing 
the coordinates of an entity and the other is to rotate the entity in the same 
coordinate system (e.g., Euler rotation). To express the capability of the 
coordinate re-expression (or change of basis), we put a super- and subscript 
in the rotation matrix, in the left top, the target coordinate system, and in the 



vector a 




(a) 



This vector can be expressed with respect to 
coordinate O as °a or coordinate R as R a 
Conversion is possible by a rotation matrix 

R x = R Rot Q * °a 
where R Rodt 0 = 




© x axis of coord. O expressed in R 

(2) x axis of coord. O expressed in R 

(3) x axis of coord. O expressed in R 



vector a 




• vector a rotated = R a n 



R a new = RotEu'er . R ao|d 



RotEuler = RRot 0 



(b) 



Figure 3.15. Two interpretations of the rotation matrix: (a) an entity is re-expressed in 
another coordinate system that is rotated from the reference. The conversion is made 
possible by multiplying a rotation matrix defined between the two frames; (b) the same 
rotation matrix can be interpreted to rotate an entity within one coordinate system. 
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lower right, the source coordinate system. The same interpretation is pos- 
sible for translation as well. Figure 3.16 illustrates that a displacement 
between two translated coordinate frames can be used to express the pos- 
ition of an entity with respect to the two coordinate frames, or equivalently 
two different positions within one coordinate system. 

To reiterate, by the theory of linear algebra, we can express a rotation 
matrix, l Rot 2 , using the basis of coordinate systems 1 and 2. l Rot 2 , as a 
matrix will contain in its columns the basis of coordinate system 2 expressed 
in the basis of coordinate system 1 . For instance, 

10 0 " 
0 0-1 
0 1 0 _ 

We can see that the columns of this matrix express the axis of coordinate 
system 2 in terms of its reference basis: (1, 0, 0), (0, 1, 0), and (0, 0, 1). What 
about the reverse operation, 2 Rot\{ 2 x, 9 = -90)? The reverse operation can 
be found by constructing this matrix with its columns as the basis of 
coordinate system 1 expressed in 2 (we can figure out what 0 will amount 
to; that is, rotating 2 with respect to 1 through the x-axis by 90 degrees is 



1 Rot 2 ( l x, 90) in matrix form = 




Figure 3.16. Two interpretations of the displacement between two coordinate sys- 
tems; (a) an entity is re-expressed in another coordinate system that is translated from 
the reference; (b) the same displacement can be used to translate the entity within one 
coordinate system. 
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equivalent to rotating 1 with respect to 2 through the x-axis by -90 degrees). 
It turns out, by the theories of linear algebra, because the columns of 
rotation matrices are orthonormal, their inverses are just their transposes; 
that is, 

2 Roh( 2 x,Q = -90) = 2 Rot] = l Roq l = l Rot^ 

Although the rotation matrix allows us to convert coordinates between 
two coordinate systems with different orientations, it does not account for 
translation. Because movements (or specification of an object location) 
usually involves both translation and rotation, it is convenient to combine 
the vector addition/subtraction (for translation) and matrix multiplication 
(for rotation) into one compact matrix representation (which is a 4 x 4 
matrix as we show). In order to accomplish this, the homogeneous coordin- 
ate system is used to represent entities in the 3D space (e.g., position, vector) 
upon which the 4x4 matrix is applied. With homogeneous coordinates we 
add a 1 as a fourth component to represent vectors in 3D space, and the 
4x4 matrix includes a fourth row (0, 0, 0, 1) to handle this additional 
component. Representing a 3D entity with such a fourth component does 
have a mathematical meaning, but this is not treated in this book. For now, 
we accept the convention for the simple purpose of convenience. Thus, the 
4x4 matrix (or better termed as 4 x 4 transformation matrix) includes the 
rotation matrix as its first submatrix and the amount of translation in its 
fourth column as shown in the following. 



and 



where 



\Rot 



0 0 0 



2 x = 1 T l * l x 



dx 
dy 
dz 
1 



(a\ 
b 

c 

w 

Note that the 4x4 transformation matrix represented this way applies 
rotation, then the translation. The result would be different if translation 
were applied first. This is easily seen in its decomposition into components as 
also seen below. Here, T Tram is a 4 x 4 transformation matrix that applies 
only translation (with its R submatrix equal to the identity matrix I), and 
likewise, T Rot is a 4 x 4 transformation matrix that applies only rotation 
(with its d component 0). 
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T\ = T 



Trans ^pRot 



Rot TTrans 



where 



I 



dx 
dy 
dz 

0 0 0 1 



and T Rot = 



Rot 



0 



0 
0 
0 

0 0 1 



The 4x4 transformation matrices are conveniently used to convert various 
entities represented in different coordinate systems into another. In fact, 
whenever objects are to be rendered to the screen by the hardware, all objects 
must be ultimately converted into the World coordinate system (once every- 
thing is converted into the World coordinates, then they are projected to the 
screen coordinates). Thus, directly encoding the 4x4 transformation matrix 
(i.e., 12 elements) between two coordinate systems is another way of ex- 
pressing relative object placement, instead of using Fixed axis or Euler 
angles, or thinking of it as movements from the default coordinate systems. 6 
On the other hand, when carrying out simulation or motion calculation, it is 
convenient to express or interpret the process with the action-oriented 
approach (such as Euler rotation and translation operators). 

Because the 4x4 transformation matrix conveniently captures both the 
rotation and translation between two given coordinate systems, the entire 
scene can be structured in a chain of 4 x 4 transformation matrices (as 
shown in Figure 3.17) that specify the physical relationship among them. 
A changing scene can be implemented by changing the values of the 4 x 4 
matrices so that the objects move and rotate in space with respect to time 
(see Figure 3.18). Most VR programming and development environments 
use a hierarchical data structure called the scene graph 7 that organizes the 
scene by a tree, where the coordinate systems of the children objects (con- 
stituents of the parent object) are defined relative to that of their parent 
object. When a motion is applied to the parent node, all of its children are 
affected by it as well. 



Function/Behavior Modeling 

Low-level simulation programming constructs or libraries are most often 
used to add behavior. Such software packages allow relatively simple 



6 The 12 elements include 9 elements in the rotation matrix and 3 elements in the 
translation vector. Relative rotation can be expressed with fewer elements using 
another mathematical construct called the quaternions. 

7 The scene graph includes the hierarchical structure of the scene/objects plus other 
additional information such as lighting, cameras, object specific properties, behavior, 
and so on. 
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Figure 3.17. Hierarchy (or chain) of relations among objects (and their coordinate 
system) using the 4 x 4 transformation matrices. 
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Figure 3.18. Physical relationship expressed in 4 x 4 transformation matrices chan- 
ging in time to express object movements. 



encoding of the functional/behavioral aspect of the virtual environment, by 
hiding and abstracting out low-level details and providing easy-to-use APIs 
for the programmers. These programming libraries also usually support 
object-oriented programming, communication with popular VR devices, 
and capabilities to import various model file formats. Direct programming 
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is the most straightforward method of encoding virtual object behavior (or 
even scenewide behaviors), however, there are other possibilities. 

Many game development platforms offer scriptlike languages, simpler 
than programming languages, to specify behaviors. With simple scripts, it 
is sometimes difficult to express complex interobject constraints and rela- 
tions. Sometimes it is also convenient to encode object behaviors in terms of 
constraints among the virtual objects (e.g., if one object moves, the others 
follow). Expressing object function or behavior using Data Flow Diagrams 
(DFD) or state-based representations can be useful in overcoming this 
problem. A DFD specifies a single process or a function in terms of how it 
can be decomposed into a number of subprocesses or objects and their input/ 
output relationships. For instance, the DFD is useful for specifying device 
interface behaviors, how the raw sensor data are transformed into a usable 
form by the number of intermediate filtering and conversion processes. The 
state-based representations can express time- or event-based coordinated 
behaviors among virtual objects very well. Many Artificial Intelligence 
(AI) behaviors are modeled using state-based representations such as 
finite-state machines. 

However, in reality, direct programming remains the most prevalent form 
of function/behavior implementation. This is partly because high-level 
authoring tools and constructs tend to abstract too many details, and 
performance tuning and addressing presence can be difficult with these 
tools, and inappropriate for something other than small-scale, proof of 
concept prototyping. Many developers still prefer to merely use the API in 
a creative manner, or even use very low-level graphic system packages (e.g., 
OpenGL, DirectX 8 ) in order to apply various optimization tricks. As such, 
these different styles of representational constructs (i.e., scripts, constraints, 
reusable components, state-based representations) are widely used as a way 
of specifying or documenting behavior prior to diving into programming as 
a way of planning and documentation. In the next section, we take a closer 
look at using the constructs to specify the important part of a system during 
development and implementation as discussed in the next chapter. 

Example: Ship Simulator Revisited (Level 1 Form 
and Function/Behavior) 

Figures 3.19 and 3.20 show initial scene graphs and geometric models for the 
Ship Simulator example introduced in Chapter 2. The whole world is com- 
posed of, for now, simply MyShip and OtherShips. MyShip is in turn 
composed of the EngineTelegraph (lever) and the Steering Wheel. Note that 
some objects are omitted for simpler illustration purpose (such as the deck 
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Figure 3.20. Initial geometrical models for MyShip and OtherShip. 



and windows of MyShip and other simple environment objects such as the 
sky and terrain). 

Figures 3.21 through 3.23 illustrate the specifications of the behaviors of 
three main objects: MyShip, OtherShip, and Camera. Upon start of the 
whole system (Ship Simulator), MyShip automatically enters an initializing 
state and then transits into four concurrent states waiting for User's key- 
board input. With the key input, appropriate actions are taken (script-based 
specification for this part is given on the companion CD). The specification 
for OtherShip is similar. Upon initialization, it automatically starts to make 
new targets and navigate toward them. A new target is regenerated at a fixed 
period of time. 



Summary 

Once important objects and interaction requirements are identified through 
activities such as storyboarding, they are roughly modeled in terms of their 
form, function, and behavior. These objects are to be refined iteratively and 
hierarchically as the development progresses into more details with respect 
to the given computational resources. Objects are put in place to form a 
scene in a hierarchical structure of the object coordinate system using 
operators and 4x4 transformation matrices. 



Summary 5 1 



Myship 



initializing 



MESSAGE (myshipjnitialized) 



operating 



waiting_up„key 



waiting_down_key 



KEYDOWN (Up) 



KEYDOWN (Down) 



waiting_left_key 



waiting_right_key 



KEYDOWN (Left) 



KEYDOWN (Right) 



Figure 3.21. The statechart for controlling MyShip. A more detailed script-based 
description of the actions taken at each state is given on the companion CD. 
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Figure 3.22. The statechart for the behavior of OtherShip. A more detailed script- 
based description of the actions taken at each state is given on the companion CD. 
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Camera 



MESSAGE (myshipjnitialized) 



waiting_z_key 



waiting_c_key 



KEYDOWN (z) 
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waiting_x_key 



KEYDOWN (c) 



KEYDOWN (x) 



Figure 3.23. The interactive behavior of the Camera as controlled by the User 
(looking around). A more detailed script-based description of the actions taken at 
each state is given on the companion CD. 



Pondering Points 

• Is it possible to express Rot_by_single_axis (k, 0) (rotation around an 
arbitrary vector) in the Euler convention? 

• Is it possible to factor out the axis order and amounts of rotation around 
each principal (fixed) axis from, Rotbysingleaxis (k, 8)? 

• Is it better to carry out a simulation of a steering wheel with respect to the 
World coordinate system or with respect to the Car (Object) coordinate 
system? 

• Suppose that you are designing the behavior of an adversary entity in a 
first-person shooting game. Express its (semi-intelligent) behavior using a 
script, data flow diagram, and a state-based representation. Which is the 
most convenient? Do the same for expressing a "running as waving" 
behavior for a humanlike character. Which do you think is the best 
representation to use in terms of expressivity and later maintenance? 



Chapter 4 

Putting It All Together 



After a rough specification (Chapters 2 and 3), an initial implementation can 
start on a chosen development platform. There are many ways to implement 
a VR system. In this book, as we have emphasized the advantages of the 
object-oriented approach, we illustrate the implementation detail using an 
object-oriented programming language (C++) with a commercial graphics/ 
simulation/VR library called the OpenSceneGraph (on the Microsoft Win- 
dows platform; [Ope04]; actual code can be found on the companion CD). A 
good introductory document (and the package itself) to the OpenScene- 
Graph can be obtained from a Web site at http://openscenegraph.sourcefor- 
ge.net for free. The OpenSceneGraph is a portable, high-level graphics 
toolkit for the development of high-performance graphics applications 
such as flight simulators, games, virtual reality, or scientific visualization. 
It is built on top of the OpenGL graphics library and provides many utilities 
for rapid development of high-performance graphics and VR applications. 
For instance, its abstraction supports built-in functionalities such as efficient 
scene data structure management, graphics rendering control, display and 
window configuration, level of detail nodes, user interface, special effects, 
intersection checking, lighting, texture, animation, and even multiprocess- 
ing. OpenSceneGraph also supports many model and image file formats (for 
reading and writing). 

We also continue to further refine and modify the specifications made in 
Chapters 2 and 3 following the spiral model advocated in Chapter 1. The 
central part of a VR application is the "scene graph" data structure. In the 
previous chapter, we explained how objects form a chain of physical relations 
to form a scene hierarchy. We also mentioned that this hierarchy would be 
extended into a data structure called the scene graph that represents not only 
the hierarchical spatial relations but also much other information about the 
scene and objects in the scene. Most VR packages including the OpenScene- 
Graph use a scene graph such as a data structure in one way or another. 

A scene graph starts with the topmost root node that encompasses the whole 
virtual world. The world is then broken down into a hierarchy of nodes 
representing either spatial groupings of objects, settings of the position of 
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objects, animations of objects, or definitions of logical relationships between 
objects. The leaves of the graph represent the physical objects themselves, the 
drawable geometry, and their material properties. The focus of the scene graph 
is usually the representation of the 3D worlds, and its efficient rendering. 

The scene graphs provide an excellent framework for maximizing graphics 
performance. They provide a way of culling (excluding and not drawing) the 
objects that will not be seen on the screen, and state sorting of properties such 
as textures and materials, so that all similar objects are drawn together. 
Without culling the CPU, buses and the GPU (graphics processing unit) 
would become swamped with too much data. The hierarchical structure of 
the scene graph makes this culling process very efficient. In fact, the scene data 
organized with the scene graph is "traversed" by the system (at each simula- 
tion round) and processed (see Figure 4.1). For instance, the scene graph can 
be traversed by the "CULL" traverser that determines what will get rendered 
among the data present in the scene graph (for instance, some data might 
be chosen not to be drawn as they are far from the user viewpoint). The 
"DRAW" traverser will carry out the rendering process as it traverses 
the scene graph (it converts all the local coordinates into the world along the 
traversal using the process explained in Chapter 3 and projects it to 
the display). The order of the traversal is determined by the hierarchy of the 
scene graph, in top-down or depth-first fashion starting from the root node (or 
user specified node) [SGI02]. 

A typical VR application (such as an OpenSceneGraph application) goes 
through the following initialization process and execution loop. 

Initialization 

1 . Initialize OpenSceneGraph. 

2. Configure the graphics pipeline, display channel, and window associ- 
ation. 1 



1 Read the OpenSceneGraph introductory manual to get more familiar with the 
concepts of graphics pipeline and channels. 




Figure 4.1. Scene graph traversal order (depth first). 
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3. Create (or load) the scene graph. 

4. Set the camera and its view frustum. 



The Loop 



Read any external input. 

Compute simulation and update the objects and scene graph. 
Update the camera position/orientation. 
Redraw the scene. 
Go back to 5. 



A channel is equivalent to a camera moving throughout the scene. A camera 
must have a position, orientation, and a view frustum or view volume. 
A view volume is a part of the 3D space that will be visible (thus drawn) 
to the user. It is defined by the "near" and "far" clipping planes and the 
horizontal and vertical field of view (as shown in Figure 4.2). Note that the 
camera is just another object that can be dynamically moved and its param- 
eters changed during the course of the application run. The location and 
orientation of the camera is specified in the same way as the other virtual 
objects (using the 4x4 transformation matrix). In fact, specifying the 
camera position and orientation using the 4x4 transformation matrix can 
be done by specifying the location vector and two other vectors, the up 
vector (up direction of the camera) and the direction vector. The third vector 
orthogonal to the up and direction vectors is obtained by their cross 
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Figure 4.2. The view frustum and camera parameters. 
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product. These three vectors constitute the rotation matrix between the 
World frame and the camera frame. 

As one can see, after the initialization, there is one monolithic loop 
process that, in sequence, checks the external input, runs object simulation, 
updates the scene and the camera according to any input and time progres- 
sion, and repeatedly redraws the output. In particular, the simulation part of 
the program corresponds to the object's behavior model, in most cases, 
coded in plain programming languages in a procedural manner. 

Although we assume a single monolithic thread for the overall computa- 
tion model for simulating and rendering the virtual environment, other types 
of computational models are certainly possible. The simulation loop can be 
split up into individual threads or processes that communicate with each 
other for synchronization purposes, for instance, sensor thread, application/ 
simulation thread, and rendering thread. The rendering thread can even be 
split up into two or more separate threads in the case that two (or more) 
graphics hardware are necessary for multichannel outputs (e.g., for stereo or 
for multisided display systems). In such a case, care must be taken to 
synchronize all the graphics hardware. In particular, for active stereo sys- 
tems where it is important that at one moment all the graphics hardware 
output images for the left eye and vice versa, the hardware "GenLock" 
signal is used to synchronize the output scan time of all the graphics 
hardware. In systems that do not employ active stereo, software frame- 
level synchronization is often sufficient (i.e., one- or two-frame misalignment 
is negligible to the human eye if the display is in mono or if using passive 
stereo) 2 

In another aspect, the rendering process can be even further separated into 
interleaved threads of the cull (that determines what gets drawn) and draw 
(the actual rendering) processes. Figure 4.3 shows the interleaving of the 
application, cull, and draw threads. In OpenSceneGraph, for instance, the 
scene graph supports multiple graphics contexts for both OpenGL Display 
Lists and texture objects, and the cull and draw traversals have been 
designed to cache rendering data locally and use the scene graph almost 
entirely as a read-only operation. This allows multiple cull-draw pairs to run 
on multiple CPUs that are bound to multiple graphics subsystems. 

Example Design and Implementation (Continued): 
Level 2 Ship Simulator 

We start to modify the Level 1 specifications/requirements to include more 
details (Figure 4.4). For instance, the following additional hypothetical 
requirements may be added. We refine the storyboard as shown in Figure 4.5. 



2 For active and passive stereo, refer to Chapter 6. 
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Figure 4.3. The multiprocess computational model for a VR program. The numbers 
in the box indicate the image frame number for which each process is computing. 



With the newly refined requirements, we further refine the overall scene 
and details of form, function, and behavior for each constituent object. 
Figures 4.6 and 4.7 show the updated MSDs. MyShip is now composed of 
additional subobjects, gauges, and radar. The environment objects, sky and 
sea surface, are also added (see Figure 4.8). Note that ShipSimulator, the 
functional entity that manages the overall simulation, is an object without 
any form and thus does not appear in the scene graph. 



Requirements (Level 2, added) 

More realistic ship movement 

- E.g. rolling and pitching 

- Rotating radar 
Collision detection 
Status information 

- Gauge display 

- Indication of successful docking 



Requirements (Level 3, added) 

• Simple weather effects (e.g. fog) 

• Wave effects 

Figure 4.4. Additional Level 2 and 3 requirements for the Ship Simulator example. 
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Figure 4.5. New storyboards for the Ship Simulator example: (a) a collision between 
MyShip and OtherShip; (b) the User is informed of the collision and appropriate 
response (OtherShip moving backward for 5 seconds and changing its course; 
(c) gauges and a message indicating a successful docking; (d) external view of a 
successful docking. 



Polygon Budgeting 

For objects with form, we plan for their suitable levels of geometric com- 
plexity through the polygon budgeting process [Hof97]. In polygon budget- 
ing, the suitable number of polygons is decided by considering the processing 
and rendering capacity of the target execution environment (i.e., your com- 
puter and graphics board). Table 4.1 assumes that a target platform can 
process about 100,000 polygons per frame (<~5 million polygons per second 
at 50 frames 3 per second). We initially determine the estimates of the object 
polygon counts. The number of LODs and their distribution are initially 
determined by the properties of the application, but they are also subject to 
change, upon iterative performance and presence tuning during later stages. 



3 A sufficiently high frame rate is targeted considering the possibility for stereo- 
scopic rendering that requires twice the normal frame rate. 
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ShipSimulator 



aship:Othership 



collided? 



yes 



move backward 



for 5 seconds 



determine the new 
speed and course 



L iterate every 1 0 seconds 
I until the simulation ends 

Figure 4.6. Updated message diagram between ShipSimulator and OtherShip for 
collision handling. 



: ShipSimulator 



collided? 



display text 
information 



yes 



:Myship 



stop for 3 seconds 



Figure 4.7. Updated message diagram between ShipSimulator and MyShip for 
collision handling. 



Modeling of Function/Behavior 

Objects ( Ones Without Geometry ) : ShipSimulator 

Figure 4.9 contains a data flow diagram for the ShipSimulator, (a formless 
object that subsumes various control functions for the simulator), and shows 
how they are coupled to each other. ShipSimulator is composed of three 
processes: SetlnitialPosition that decides the initial position of OtherShip, 
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CheckCollision that checks collisions with other objects, and SetMyShipPo- 
sition that accepts a newly joined My Ship for training (there may be several 
users being trained at the same time). All the leaf processes (circular blocks 
in the DFD) will correspond to an actual block of code in the VR program. 
Such a computational module may require a "controller" (the rectangular 
box in Figure 4.9) that is responsible for initiating and coordinating the 
process executions (when it has one or more leaf processes), and bringing 
about state transitions. 

The internals of the controller may be specified using a state-based repre- 
sentation similar to a statechart [Har90], as shown in Figure 4.10. In this 
example, the process SetlnitialPosition generates the event Init Complete that 
triggers a state transition for Ship Simulator. This controller changes its 
state from Idle to DoingSimulation by the event start* and the state Doing- 
Simulation has two concurrent states that manage OtherShips and MyShips 
simultaneously. Because all the instances of OtherShip are generated auto- 
matically at the beginning, the state Other Ship Control changes its substate to 
the state CheckingCollision after the instances of OtherShip are created and 
initialized successfully. The state MyShipControl notifies the initial positions/ 
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Figure 4.9. The data flow diagram of ShipSimulator [Seo02]. (Reprinted with 
permission from MIT Press © 2004.) 
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Figure 4.10. The state-based representation of the controller "ShipSimulator" 
[Seo02]. (Reprinted with permission from MIT Press © 2004.) 

orientations of instances of the joining MyShips and returns to the state 
Waiting My Ship, to wait for another MyShip to join for training. Making 
specifications such as this prior to actual coding makes it easy to mentally 
visualize and simulate the whole object behavior and promotes a more effi- 
cient and structured development process. 
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Modeling Interactive Behavior: My Ship 

Although the behavior and function of MyShip is simple, it is the only object 
that has dynamic behavior controlled by user interactions. The state-based 
specification in Figure 4.11 depicts how a User controls the velocity and 
orientation of a vessel using various keys. MyShip communicates with the 
Keyboard object for input events and also transmits events to the ShipSimu- 
lator to indicate collision or success of docking. Two concurrent states are 
added, representing the newly satisfied requirements of producing the rolling 
and pitching motion of the ship. 



Modeling of Objects with Form (and their LODs): 
OtherShip 

In this subsection, we demonstrate the process of concurrent consideration 
and modeling of form and behavior and LOD generation. In the case of 
OtherShip, we start with a very simple form model of a ship (for no particu- 
lar reason; we could very well start with a rough function/behavior model 
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Figure 4.11. State-based representation of MyShip 's behavior. 
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instead). This simplest form model (Gl) is composed of a set of boxes and 
has 106 polygons, roughly within the estimate of the initial polygon budget 
(see Table 4.1). The OtherShips simply serve as moving environment objects 
so that the trainees can train to maneuver on the sea. At this point, we can 
refine the original behavior model (Bl) from Chapter 3 to handle collision 
and produce pitching and rolling motion and rotating radar (not shown in 
the statechart). The resulting statechart is shown in Figure 4.12. Note that in 
order to include the new behavior of rotating radar, the form model Gl must 
be refined into G2, one with the radar. Thus, the next geometric model G2 is 
produced by adding the "radar" geometry along with other parts for a more 
realistic appearance (see Figure 4.13). 

As one can see, the incremental and hierarchical nature of the modeling 
process naturally leads to the generation of LOD models both for geometry 
and behavior. Figure 4.14 depicts the overall refinement process of Other- 
Ship and the arrows represent the sequence of refinement. Note that the flow 
and number of refinements (or the number of LODs) may vary according to 
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Figure 4.12. The updated behavior model of the Other Ship. 




Figure 4.13. Geometric LODs for OtherShips. Note that the rotating behavior can 
be realized with the corresponding subobject [Seo02]. (Reprinted with permission 
from MIT Press © 2004.) 
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form G 2 



form G 3 
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Figure 4.14. Refinement process of the object OtherShip. Geometries or behaviors 
with different levels of detail are constructed along the refinement process (the higher 
the number, the more refined the models are). (Reprinted with permission from MIT 
Press © 2004.) 



the intent of the developer, and as long as the "compatibility" condition 
(that there exists geometric entities corresponding to entities in the behav- 
iour model) is kept, any geometric and any behavior model can be com- 
bined. Figure 4.15 shows a snapshot of the resulting "Level 2" 




Figure 4.15. Snapshot of the Virtual Ship Simulator and excerpts of actual code 
based on the specification. The ships can be at different levels of detail both in terms 
of geometry and behavior. The user should experiment with several configurations to 
try to produce the best effect while maintaining an acceptable frame rate. (Reprinted 
with permission from MIT Press © 2004.) 
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. . . . (omissionl .... 

//initialize ShipSimulator 
void ShipSimulator : : initShipSimulator ( ) 
{ 

. . . . (omissionl .... 

MyShip: :init() ; 
OtherShip: : init () ; 
SteeringStand: :init() ; 
EngineTelegraph: :init () ; 
Voyage I nf oDi splay : : init { ) ; 

. . . . (omissionl .... 

createScene () ; 

.... (omission) .... 

} 

.... (omission) .... 

//frame controlled loop 
void ShipSimulator: : run ( ) 

{ 

while (myShip->checkESCKey ( ) ==f alse) 
{ 

pf Sync ( ) ; 

/ /update view point 

chan->setViewMat (myShip->getViewMatrix ( ) } 

//collision control 
controlCollision() ; 

pf Frame () ; 

} 

pfExitO; 

} 

. . . . (omission) .... 

Figure 4.16. Excerpts of actual code based on the specification. The complete 
specification and code are contained on the companion CD. 

implementation. The actual code can be found on the companion CD (also 
see Figure 4.16). 

Summary 

Based on the specifications, we can start with the initial implementation. 
Starting with their initial rough specifications, the form, function, and 
behaviors of the virtual objects are incrementally and hierarchically refined 
into more detail and coded, guided by the polygon budget. Intermediate- 
level models can be used as level of detail objects. 

Pondering Points 

• Suppose you used a state-based representation to specify the behavior of 
one of your virtual objects. How can this behavior be implemented using a 
common programming language such as C++? 

• How would interobject events actually be implemented or handled? For 
instance, consider a situation where the virtual signal light changes its 
color to green, and the other virtual car waiting in the road must catch this 
event and move accordingly. 



//control collision 

void ShipSimulator : : controlCollision ( ) 
I 

f or (int i=0; i<OTHER_SHIP_NUM+l ; i++) 
1 

pfVec3* rec=ships [i] ->getBoundaryRec ( ) ; 
for (int j=0;j <OTHER_SHIP_NUM+l ; j ++ } 
{ 

i£(i!- j) 

I 

if (ships [j ] ->detectcollision (rec) ) 
I 

ships [ j ] ->setCollisionMode 0 ; 
ships [i] ->setCollisionMode () ; 

} 

} 



sample code : ShipSimulator.cpp 



Chapter 5 

Performance Estimation 
and System Tuning 



The Presence and Performance Trade-off 

Most people in the VR community seem to agree with the definition of 
presence as the feeling of being in the VR world and on the importance 
of provision of presence as a defining quality of VR. Several researchers 
have studied the elusive notion of presence. Many factors affect presence 
and some are related to system performance. For instance, it is likely that 
system load will be increased by providing more sensory channels, increas- 
ing visual and simulation fidelity, increasing degrees of interactivity, and so 
on. System delay and occasional disruption in continuity are reported to 
have a very negative effect on presence, even more than lowered pictorial 
realism [Wel96]. Research in this area is still at the preliminary stage, 
therefore the design guideline for VEs with high presence awaits future 
research efforts. For the time being, we have to resort to old-fashioned 
trial and error process for system tuning (for producing the highest presence 
with the available resources). Even though the trial and error process 
is perhaps the most primitive and time-consuming process, there currently is 
not a scientific methodology to do this in a different way. However, by 
sticking to the structured development strategy overall, the trial-and-error 
process can be minimized. As the system is tuned, the system can be further 
refined, while we make trade-offs between expected benefits and required 
computational cost. 



Tuning with the LOD Models 

We can, for instance, start the tuning process with trying out different LOD 
mixes and distribution. Depending on the computational capability of the 
given processor and graphics board the right LOD mix (e.g., 15 Autoships 
at G3/B3, 50 at G 2 /B 2 , and 30 at G1/B1) can be tuned to produce the best 
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Table 5.1. Simulating 50, 75, and 100 OtherShips with Different LODs a 



LOD 




Ave. Frame Rate 




50 


75 


100 


Li(Gi, BO 


41 


41 


40 


L 2 (G 2 , B,) 


43 


37 


30 


L 3 (G 2 , B 2 ) 


33 


26 


22 


L 4 (G 3 , Bi) 


34 


28 


25 


L 5 (G 4 , B 3 ) 


27 


23 


18 



"An LOD is defined by the geometric LOD denoted G„ and the behavioral LOD denoted B„, 
where the higher the value of n, the more detailed and more computationally heavy the virtual 
object is. Adapted from [Seo02]. (Reprinted with permission from ACM © 2004.) 



effect, with an acceptable frame rate. Because there will be many Other- 
Ships, we might assume it could dominate the load on the processor and the 
graphics board. Assuming that the application required about 50 to 100 
AutoShips, we check the performance by simulating the amount of Auto- 
Ships at five different LODs, as shown in Table 5.1 (Figure 4.15 shows the 
VR environment). 

Table 5.1 shows performance test results in terms of the average frame 
rates for each chosen LOD. Although results 2 show the expected trivial fact 
that more complex models produce lower frame rates, we observe that the 
variances in the frame rates are not linear. For instance, if we were to select a 
2-LOD-mix for the case of 50 ships, we might choose L2 and L3, because 
they have higher details at a similar cost versus LI and L4, respectively. 

Presence/Special Effects: Third Stage of Spiral Process 

In Chapter 2, we explained the spiral process, one of the modeling philoso- 
phies emphasized in this book. The spiral process has three stages and the 
basic iterations of 'Requirement Analysis — Design — Validation' occur con- 
tinuously at each stage. Presence and special effects are considered toward 
the end of the spiral with close monitoring of the remaining resources after 
all the basic functionalities are built and reasonably tested. Table 5.1 already 
shows that even with more than 100 AutoShips at high model detail, we 
would be able to obtain up to 30 ~ 40 fps performance (with < 50 fps, 
adding stereoscopy might be sacrificed depending on a careful choice of 
the LOD mix). Thus, we can attempt to realize the 'Level 3' requirements 
given in the previous chapter, adding the weather and wave effects. 



Likewise, the LOD can be simply selected and dynamically switched based on 
other criteria such as distance from the user, screen projected size, and so on. 

2 The simulation was performed on a desktop PC with an Intel Pentium 4, 1 .4-Ghz 
CPU, 512-M bytes (RDRAM) main memory, and an Nvidia Geforce2 MX graphics 
card, using MS Windows 2000, WorldToolKit, and MS Visual C++. 



68 



5. Performance Estimation and System Tuning 



Taking Advantage of the Graphics Hardware Features: 
Fog Effect 



It turns out that to implement a simple weather effect such as fog is very 
simple because of the built-in capabilities of today's graphics hardware. And 
because it is hardware-supported, there is not much performance drop. In 
fact, there are many other special effects features that are hardware- 
supported, such as shadow, texture mapping, particle simulation, and so 
on. In particular, we take a closer look at taking advantage of images or 
textures to enhance visual realism and ease of modeling. 

Using Images and Textures for Object/Scene Modeling 
and Fast Rendering 

One of the ways to avoid tedious modeling efforts yet achieve reasonable 
realism is the use of textures. Textures or images also enhance visual photo- 
realism, and today's graphics hardware provide great support for fast tex- 
ture mapping. Textures are (usually) a rectangular piece of image sample 
that can be pasted upon the object surface (which would in effect be a 
collection of polygons or triangles). There are many ways a texture can be 
pasted upon the object surface. In the simplest case, a texture is applied to a 
planar surface. The process is depicted in Figure 5.1. 



1 . Three corresponding 
points between triangle 
vertices and region in 
texture 




Figure 5.1. Pasting a texture on a 3D triangle. 
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First, the user must specify the corresponding texture pixels (texel) for the 
three vertices. The three vertices would be projected to the display screen, 
and during the scan conversion process to render the interior of the projected 
triangle on the screen, the right color must be brought from the correspond- 
ing place in the texture. The final rendered color at that pixel is a function of 
this texel color and other parameters related to lighting and shading (e.g., 
viewpoint, triangle/surface normal, shading model, etc.). In order to map the 
pixels in the screen space to the texture space, each space is parameterized 
using unified coordinates as shown in Figure 5.2. 

There are many other forms of texturing that developers can take advan- 
tage of for producing various modeling details, for instance, cylindrical 
mapping, spherical mapping, environment mapping (for modeling environ- 
ment reflection on shiny objects), and bump mapping (for modeling protru- 
sions). Most VR, game, or graphics engines provide built-in functionalities 
for such texture mappings. 

In addition to modeling the complex surface properties of objects, textures 
can be used to represent the object itself. Billboards and moving textures 
(sprites) are such examples. A billboard is a simple planar primitive with a 
pasted texture that is continuously rotated to face the viewer (see Figure 5.3). 
Billboards are often used for (rotationally symmetric) objects that look 





Figure 5.2. Parameterizing the texture and 
correspondence between (u, v) and (U, V). 



screen/object 



space for establishing the 
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(b) 

Figure 5.3. Two cases of billboards: (a) placing multiple planar textures around axis 
of symmetry; (b) rotating the texture to the changing viewer direction. 

similar when viewed in different directions. One variation is to post more 
than one textured planar primitive in different angles around the axis of 
symmetry (instead of rotating them to face the viewer). The transparency of 
the texture is set such that only the relevant object is opaque (e.g., so that 
objects behind the trees can be seen through). 

Sprite or moving texture is another texture-based technique much taken 
advantage of in the field of computer games for character animation and other 
special effects. Key postures of a character in a motion sequence are captured 
as textures and they are replaced in sequence in a short amount of time to 
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Figure 5.4. Six textures switched fast to produce animation effect (Sprite). 

produce the animation effect (see Figure 5.4). Sprites can be used not only for 
character animation but for other special effects such as fire and glow, explo- 
sion, water flow effects, clouds, and the like. More advanced image-based 
methods can generate intermediate images from neighboring images (in the 
motion or view sequence) for a smoother transition and less distortion. 

Finally, textures can be used to model not just objects but a big part (if not all) 
of the scene, for instance, for background such as the sky or a mountain range. 
For instance, in the QuickTime VR approach by Chen [Che95], a panoramic 
image is captured and stored as a cylindrical environment map, and depend- 
ing on the viewer location and viewing direction, the appropriate part 
of the environment map is retrieved and rendered to the user (see Figure 5.5). 




Figure 5.5. QuickTime VR. A panoramic environment map is constructed from a 
number of images and depending on the view direction, a portion from the map is 
retrieved, warped, and rendered to the user [Che95]. (Reprinted with permission from 
the ACM © 2004.) 
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In the approach called "Tour into the Picture" by Horry et al. [Hor97], a 
static image is split up into regions according to the location of the vanishing 
point, and each of the pieces is pasted upon the interior of large rectangular 
box (see Figure 5.6). Object and scene modeling (and even behavior model- 
ing) can be made easier by employing these texture techniques in a creative 
way. 



Adding the Wave Effect 

Thus, now we are ready to realize the next remaining special effect to 
improve the appearance of the sea surface, by using the moving texture 
technique. Instead of using a big static textured polygon to represent the 
sea, we can employ a simple logic to rotate through four texture images to 
mimic the "visual" dynamics of the sea. (See the code on the companion CD 
for details of this simple special effect trick.) Figure 5.7 shows this newly 
added behavior of Sea. Ship Simulator determines the period according to the 
parameters of the simulation environment, that is, the velocity of the wind 
and the height of the wave, and generates the event "next_texture." 
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Figure 5.7. Moving sea behavior by moving textures. 

Summary 

Once the basic functionalities are in place, further refinements can be added 
while considering the overall performance and available computational 
resources. Such added functionalities may include special effects, physical 
simulation, and presence-enhancing elements. The environment itself can be 
made richer by simply adding more objects to the scene subject to resource 
availability as well. 

Pondering Points 

• Would you choose to have stereoscopic display, thus effectively reducing 
the rendering quality by half by having to generate two separate outputs 
for left and right displays, or to render in monoscopy with higher graphic 
quality? Explain your answer. 

• Would you choose to have many simple environment objects (e.g., rocks) 
or fewer with higher detail? Explain your answer. 



Section II 

Making of the Virtual Reality 



A VR system is a type of simulation system. The first part of the book went 
over how to build a simple object-oriented simulation system. Virtual reality 
systems, more so than other software/hardware systems, are highly complex 
and heterogeneous. Developers must go through much trial and error for 
content configuration and performance tuning; thus the system must have 
high maintainability. For that reason, a structured system building approach 
was advocated and highlighted in the first part of the book. The second part 
of the book concentrates on various features that breathe life into a "mere" 
simulation system and make it a true experiential virtual reality system, 
namely, 3D multimodal interactions, usable and natural interfaces, physical 
simulations and avatars, and other presence-promoting effects. 



Chapter 6 

Output Display 



The term 'display' is usually associated with the 'visual' output. However, in 
the context of virtual reality, one of whose goals is to mimic the way humans 
usually interact in the real world, it refers to any modality output, that is, 
visual, aural, haptic (a term referring to the sense of touch and force 
feedback), olfactory, thermal, taste, and so on. A VR system, in simulating 
an experience, may (and should strive to) generate various signals and 
stimulations in many modalities, and use various display devices to convey 
them to the human user. Ideally, the display devices should be ergonomically 
designed, have sufficient fidelity or resolution for the user, and match the 
perceptual capability of humans. At the same time, the application develop- 
ers must understand the perceptual capabilities of the human sensory sys- 
tem, and convey the right amount of modality stimulations, integrate and 
synchronize them, and deliver them to the human user using the right display 
devices. We start by examining the important parameters of the human 
visual system, and the various display systems available for VR usage, and 
how to use them for appropriate visual effects. 



The Human Visual System 

Let's examine very briefly how the human eye operates to acquire raw visual 
input. Figure 6.1 shows a simple inside look at the human eye. Light 
reflected off objects in the world enter the eye, and are refracted first through 
the cornea (75%), the lens (23%), and the liquids within the eyeball (2%) 
before reaching the retina in the back of the eyeball. The image is formed 
upside down on the retina. The retina is a region at which there exist millions 
of photoreceptor cells. Two types of photoreceptors make up the retina; the 
rods and the cones. The rods are more sensitive to changes in light intensity 
(or low-intensity light) and distributed more at the peripheral region. Cones 
react more to high-intensity light and are concentrated at the center of the 
retina called the fovea. The image formed at the retina is more focused and 
has clearer details at the 'fovea!' region, whereas the peripheral region is less 
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Figure 6. 1 . The anatomy of the human eye. 

so, but with greater sensitivity to detecting 'moving' objects. The light 
intensity/color information from photoreceptor cells of the right and left 
eyes is relayed to the left and right parts of the brain through the optic nerve. 

The amount of light that enters the eyeball is controlled by the iris (the 
opening of the eye itself is called the pupil) and the associated muscles. The 
image focus is controlled by the muscles that act on the lens by changing its 
size and shape. The muscles that control various parts of the eye are 
collectively called the occulomotor muscles. 

There are many parameters, naturally most of them being the attributes of 
all those that constitute the human eye system, that ultimately determine the 
quality of the acquired raw visual input. For instance, the density of the 
photoreceptor cells would be an important factor in determining visual 
acuity (the minimum size of a recognizable visual pigment). The way the 
photoreceptors react to light will determine human sensitivity to brightness. 
Among the many such properties or parameters of our visual system, we 
make a note of the following. 

First is the visual acuity as explained above. Human visual acuity is the 
minimum lateral length the human eye can perceive. Note that acuity is at its 
highest resolution in the foveal area (central region of the visual field), and 
drops at the periphery of the retina. However, in general, humans are known 
to have about 5-arc min 1 of visual acuity. With this visual acuity, humans 
would be able to recognize a letter of about 0.4 in (0.8 cm) from a 20-ft 
distance (this is called 20/20 vision for the left and right eyes; see Figure 6.2). 

Human 'visual acuity' should be considered in determining the spatial or 
angular resolution of the display system with respect to the user's distance 
from the display. The spatial 'resolution' of the display system refers to the 
number of pixels that can be displayed in a unit display area, and the angular 



1 1 arc min = 1/60 degrees, and 1 arc sec = 1/60 arc min (confirm). 



The Human Visual System 
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Figure 6.2. Visual acuity and 20/20 vision. With about 5-arc min of visual acuity 
(or 20/20 vision), a letter of 0.4 in can be recognized from a 20-ft distance. 

resolution refers to the visual angle the pixel subtends from a particular 
viewing distance. A display resolution (spatial or angular) is dependent on 
the value of the pitch, the size of the pixel. Another important visual 
parameter is the 'Field Of View (FOV)', the angle subtended by the viewing 
surface from a given observer location. Humans have an FOV of about 120 
degrees vertical and 180 degrees horizontal. In fact, because humans can 
rotate their necks they can obtain virtually 360 degrees spherical FOV. 
Providing a wide FOV (matching it to that of humans) is a very important 
factor in promoting the sense of presence. Various display devices, other 
than monitors, in combination with head-tracking, can be used to provide a 
wide physical or operational FOV. Given a display system and a user 
location, one can compute the FOV, and set the resolution of the display 
system appropriately according to the visual capability of the user (see 
Figure 6.3). 




Display 
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Figure 6.3. Field of view and convergence angle. 
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Figure 6.4. Setting various parameters for the display system. 



Another important display system parameter is the Critical Flicker Fre- 
quency (CCF) and the Refresh Rate (RF). CCF refers to the rate at which 
the whole display screen is completely rescanned (a display system usually 
works by continually redrawing each horizontal 'scan' line (or rescanning) at 
a very high rate) and humans are known to start feeling the display to flicker 
at a CCF of 50 Hz (see Figures 6.4 and 6.5). The RF (also known as the 
frame rate) refers to the update of the content to produce smooth animation. 
Humans usually start to notice that the animation is not smooth when the 
RF drops below about 10 ~ 12 Hz. Other display parameters that a devel- 
oper might be able to control, but not used much in practice are the hue, 
brightness, and contrast. 




Figure 6.5. Scan line and critical flicker frequency in a display monitor. 
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Human Depth Perception and Stereoscopy 

In a macro scale, humans do live attached to a mostly flat ground, making 
them 2D-oriented creatures. But humans are also 3D-oriented creatures, and 
operate daily using the depth perception capability in the immediate space 
around them. Consequently, providing depth information is important in 
realizing 3D and natural interaction in the virtual environment. 

Depth perception is possible in many ways. Physiologically, depth infor- 
mation is extracted from the two slightly different views of the world that are 
input through the right and left eyes (Figure 6.6). The two images are fused 
in the brain and the difference (also known as the disparity) between them is 
processed in the brain to create the sense of depth. This is called depth 
perception or stereoscopy from binocular disparity (we come back to bin- 
ocular disparity later). Binocular disparity is a very strong depth perception 
cue for viewing ranges within 10 <~ 15 m from the eyes. 

Another physiological fact used by the brain is the signal that comes 
from the occulomotor muscles. When looking at objects, humans adjust 
the size of their eye lens and rotate the eyeballs. Adjusting the size of 
the eye lens (to focus on an object) is called 'accommodation' and the 
rotation of the eyeball (to focus also but more for fusing images from two 
eyes) is called 'convergence' (see Figure 6.7). Note that accommodation and 
convergence are coupled together; that is, when the eyes converge on a 
certain object, the accommodation automatically kicks in. The amounts of 
accommodation and convergence are sensed by the brain to help determine 
the depth of the object. Although most effective within the range of 5 to 
10 m from the user. 

There are also many psychological (nonphysiological) cues that help 
humans sense depth. They include the effect of perspective views (parallel 
lines coming to a vanishing point on the horizon), object occlusion (feeling 
that the occluded object is deeper), existence of shadows, motion parallax 
(far objects seem to move less rapidly), and relative size (far objects look 
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Accommodation 
Figure 6.7. Accommodation and convergence. 



Convergence 



smaller); (see Figure 6.8). Actually, sensing of motion and shadow are 
both psychological and physiological cues. There are regions in the brain 
where the visual information is processed, that are responsible for sensing 
shadow, disparity, and motion. All of these cues are known to be more or 
less additive (the more cues there are, the richer the depth information is), 
and can be used effectively to provide the sense of depth in a virtual 
environment. 
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Figure 6.8. Various psychological depth cues. 
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Stereo and Binocular Disparity 

To be precise, binocular disparity refers to the difference in retinal images 
between the two eyes due to the projection of objects at different depths. As 
shown in Figure 6.9, when one focuses on a point (point A) in space, the 
depths of the other points (e.g., point B) are felt with respect to the focused 
point. 

The depth of the focused point A becomes the "zero" disparity point, 
serving as a reference point. Other points in space that are closer or farther in 
depth will have different retinal projections in the two eyes relative to the 
zero disparity point. In the figure, the points on the retina at which point B 
projects are off from the points to which the focused point A projects. This 
"off amount is called the disparity. Because there are two eyes, the situation 
would create two disparity values. The total disparity is the sum of the 
disparities from the left and right eyes. Note that the disparity value carries 
a sign with respect to the zero disparity point: for the right eye, + when 
disparity is off to the clockwise direction and — when off to the counter- 
clockwise direction (and vice versa for the left eye). 

Figure 6.10 shows the relationship between the disparity value and the 
relative depth (distance between the two points in space A and B) [McK92]. 
The figure shows that the disparity is directly related to the difference in the 
'convergence' angles for the two points A and B. In fact, the difference in 
convergence angles between two points is equivalent to the binocular dis- 
parity as measured in terms of the angles. Furthermore, this difference in 
convergence angles is proportional to the relative depth (Dl — D2). Note 
that the relative depth has a non linear relationship with D2, the distance to 
the object. Suppose, in a stereo display system, B represents the display 
surface; then D2 becomes the viewing distance. That is, the depth felt by 
the user will be quite different for different viewing distances (e.g. when 
viewed by a large audience; see Figure 6.11). 




Focused point A 



Right Eye 

Figure 6.9. Binocular disparity. 




- D2 



i = Interpupiliary Distance (IPD) = Interocular distance (IOD) 



Figure 6.10. Disparity, convergence angles, viewing distances, and (relative) depth. 
(Adapted from [McK92] and reprinted with permission from Presence © 2004.) 



92 6. Output Display 




PROJECTORS 



Figure 6.21. Using light polarization to produce stereo. 

right would have its polarization axis aligned vertically. The audience wears 
glasses that have a Polaroid filter for each eye. The result of this arrangement 
of projectors and filters, is that the left eye sees the movie that is projected from 
the right projector and the right eye sees the movie that is projected from the 
left projector (see Figure 6.22). This gives the viewer the perception of depth. 
Note that if only one display system is being used (for instance, rather than 
two projectors as shown in the figure), the interlacing technique can be used to 
display the right and left images simultaneously. That is, the even scan lines 
are used to render the right image and the odd for the left (see Figure 6.23). 
Note that this would reduce the image resolution in half. 

Time Multiplexing (Active Stereo) 

The next popular method of creating a nonautostereoscopic system is called 
the time multiplexed method. The methods so far produced left and right 
images that are actually rendered simultaneously and filtered for viewing. The 



Interlaced 
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Figure 6.22. Taking advantage of interlacing to display the right and left image 
simultaneously. (Courtesy of Namgyu Kim.) 
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Figure 6.23. Time multiplexed stereo using a color wheel. 

time multiplexed systems switch very fast between the left and right images, 
and render only one image at a time. This system is also often called the active 
system as opposed to the polarized glass approach most referred to as the 
passive system. Figure 6.23 shows an example of a time multiplexed system 
using a color wheel that is appropriately designed and turns, in synchroniza- 
tion with the display, at a rate so that each eye only sees the appropriate image. 

A more elegant solution, but based on the same principle, is through the 
use of shutter glasses. The user wears electronic glasses that operate in 
synchronization with the display system in that when the left image is 
shown, the right part of the glass is blocked (so that only the left eye gets 
the left image) and vice versa. The glasses (or the shutter glasses) are usually 
made of an LCD so that the "blocking" can be achieved by making the LCD 
screen/window "black" by a synchronized electric signal (see Figure 6.24). 




Figure 6.24. Time multiplexed stereo display system using shutter glasses. 
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Because the switching of the image is very fast, the user is not aware that 
only one image is actually being rendered at a time. In fact, the switching 
frequency must be set at around 60 Hz for each eye, same as the Critical 
Flickering Frequency (CFF). Thus the display system must be able to support 
about 120 Hz of refresh rate to support time multiplexed stereo systems. 

Head-Mounted Display 

Finally, the third major nonautostereoscopic display system used for VR is 
the Head-Mounted Display (HMD). HMD usually employs two separate 
display devices designed to provide isolated display for each eye (see Figure 
6.27; note that there are HMDs made of only one device that extract two 
channels of display for each eye through the use of optics). Two separate 
(synchronized) images (for the right and left eye, respectively) are generated 
and fed into the respective display channel to create the stereo effect. The 
display isolation also creates the feeling of immersion aside from the stereo- 
scopic effect, another important factor in creating effective VR applications. 
Unlike large displays such as monitors or projectors, HMDs are worn on the 
head (in order to provide isolated images to the eyes). As for the display 
device itself, miniature LCDs or CRTs are used. The small images that 
appear on these LCDs or CRTs are magnified through the use of optics. 

Simple magnifiers (Figure 6.25) are problematic because the magnifying 
lens often cannot be placed close to the image to produce a wide FOV due to 
other image-enhancing modules. The eyes must be positioned quite closely at 
a certain spot from the magnifying lens in order receive a bright-enough 
image. This can cause strain on the eyes even though the exit pupil (see 
below) is quite forgiving. Instead, a compound microscope HMD design is 
often used as shown in Figure 6.26. A second lens is used to produce an 
intermediate image. This combination produces a small range of distance 
(called the exit pupil) at which the eyes can be positioned to receive most of 




Figure 6.25. A simple magnifier design for HMD Design. 



Visual Display System (Stereoscopic) 95 




First lens 



Figure 6.26. The compound microscope HMD design. 

the images. To be precise, the exit pupil (or eye motion box) is the area where 
the eye can be placed in order to see the full display. If the eye is outside the 
exit pupil then the full display will not be visible. The use of the second lens 
also allows for a wider realizable FOV as the second lens can be placed very 
close to the intermediate image. 

The lens close to the eye is usually adjustable so that the exit pupil can be 
adjusted to the position of the eyes for user convenience. The exit pupil 
formed in this way is usually located farther than the sweet spot of the simple 
magnifiers, thus causing less eyestrain. However, the exit pupil is not so 
forgiving and the right positioning of the eye becomes more important. On 
the other hand, to allow users with glasses to use the HMD, there must be 
enough distance between the eye and the closest optical element from the eye. 
This distance is called the eye relief. An eye relief of 25 mm is usually known 
be the minimum for use with eyeglasses. (If the HMD is focusable such that 
eyeglasses are not required, then the eye relief can be less.) Generally the 
greater the eye relief, the smaller the exit pupil will be [CRL04]. 

HMDs are nice because they provide images "isolated" from the external 
world (the eyes only see the images and not the outside world) and are often 
coupled with head-tracking to provide viewpoint-dependent display. This 
results in a much higher sense of immersion and presence. However, most 
HMDs suffer from narrow fields of view and heavy weight (not to mention 
being tethered to the computer). Combined with the fundamental problem of 
how stereo display systems cause inconsistency between the eye's accommo- 
dation and convergence, it poses a big human factors problem. Other prob- 
lems with the HMD include the imprecise location of the eye that changes 
during the period of user wearing (with respect to the exit pupil and eye relief). 
In addition, the eyes may not be positioned in the center with respect to the 
image planes (screens) nor their direction perpendicular to them. 

HMDs come in two varieties: see-through and nonsee-through. The non- 
see-through ones allow the user only to see the images on the display devices, 
however, the see-through HMDs allow users to "see through" the display 
and see the outside world (e.g., images are shown by half-mirrors). This way, 
the computer-generated images and outside real-world images can be shown 
together for augmented reality applications. See Figure 6.27. 
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Figure 6.27. Various HMDs. Left: Non-see through, Right: See through. 



Implementation 

In all major types of display systems we have described, the generation of 
stereo effect requires generation of appropriate images for the left and right 
eye (see Figure 6.28). For instance, one simple method to create a pair of left 
and right images is to take two photos or two movies in which the camera 
positions for the respective images are separated by an approximation of the 
InterPupiliary Distance (IPD) (because different people have different IPDs, 
an average value (<~6.5 cm) is normally used). Graphic rendering can be 
made based on two view volumes set up as shown in Figure 6.14, likewise 
separated by the average IPD in the x-direction of the camera coordinate 
system. As for the computer-generated images, the IPD value can be a 
variable in the system, thus customizing the view volume model according 
to the respective user. 

Other Display Systems 

There is a variety of standard display systems designed to promote higher 
immersion or customized for particular tasks. One natural variation is using 
large or multiple screens to offer a wide field of view. Figure 6.29 shows a 
system originally conceived at University of Illinois, Chicago (called the 
CAVE) in the early 1990s [Cru93]. Ideally, a small cubical room is con- 
structed with each of the sides serving as a large projection screen. The user 
is literally totally immersed in 360 degrees. Both passive and active stereo 
systems can be employed. Due to the high cost of building such systems 
(a six-sided CAVE with passive stereo would normally require 12 projectors 
and mirrors), only a few sides are sometimes used (e.g., just four for right, 
center, left, and the bottom; see Figure 6.29). 
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Rather than using multiple tiles of rectangular and upright screens, spe- 
cially made large spherical or cylindrical screens can be used also. Most 
often multiple projectors are used because one projector is not enough to 
cover the whole area. Because the projection surface is curved, the image is 
pre-distorted or warped so that it appears correct on the curved display 
surface. Note that as the screen gets larger, the viewing distance must 
become larger to keep the whole surface in view. Such display systems are 
often used for large-scale displays for a large group of people, for instance, in 
a theaterlike setting, in which immersive viewing is the primary purpose 
(with minimal interaction). 

Figure 6.30 shows the workbench type of display where one projection 
surface is used in a tablelike manner using a mirror underneath. Such table 
displays are very suitable for tasks such as painting, surgery, operation 



int main (int argc, char *argv[ ]) { 

/* set up views */ 

left = pfNewChan (p) ; 
right = pfNewChan (p) ; 

pf SetVec3 (hprOf f sets, -eyeAngle, O.f, O.f); 

pfSetVec3 (xyzOffsets, -Iod/2.f, O.f, O.f); 

pf ChanViewOf f sets (left, xyzOffsets, hprOffsets); 

pfSetVec3 (hprOffsets, eyeAngle, O.f, O.f); 

pfSetVec3 (xyzOffsets, Iod/2.f, O.f, O.f); 

pf ChanViewOf f sets (right, xyzOffsets, hprOffsets); 

... } 

static void DrawChannel (pfChannel *channel, void *lef t) 
{ 

/* draw and switch between views */ 
if (Shared->stereo) { 

if (* (int*) left) { 

glDrawBuffer (GL_BACK_LEFT) ; } 

else { 

glDrawBuffer (GL_BACK_RIGHT) ; } 

} 

_j 

(a) 

Figure 6.28a. Example code fragments for setting up stereoscopic display rendering 
using Performer 2 : (a) time multiplexed; 



2 Performer is a registered trademark of SGI Corporation. 
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Int main (intargc, char *argv[]) 
{ 

/* set up views */ 

for (loop=0; loop < 2; loop++) 

{ 

chan [loop] = pfNewChan (p) ; 

pfChanTravFunc ( chan [loop] , PFTRAV_DRAW, DrawChannel) ; 
pf ChanScene (chan [loop] , scene) ; 

pf ChanNear Far (chan [loop] , l.Of, 10. Of * bsphere. radius) ; 
pfChanFOV (chan [loop] , 45. Of, O.Of); 

f stats= pf GetChanFStats (chan [loop] ) ; 
pfFStatsClass (f stats, PFSTATS_ENGFX, PFSTATS_ON) ; 

} 

pfSetVec3 (xyz, 2. Of, O.Of, O.Of); 
pfChanViewOff sets (chan [0] , xyz, hpr); 
pfSetVec3 (xyz, -2. Of, O.Of, O.Of); 
pfChanViewOf f sets (chan [1] , xyz, hpr) ; 
pfChanViewport(chan[0] , 0.0, 0.5, 0.0, 1.0); 
pfChanViewport(chan[l] , 0.5, 1.0, 0.0, 1.0); 

/* display to two view channels */ 
for (loop=0; loop < 2; loop++) 

{ pfChanView (chan [loop] , view. xyz, view. hpr ) ; } 

} 

(b) 

Figure 6.28b. Continued (b) HMD. 



planning, and so on. Compared to the large-scale spherical or cylindrical 
displays, CAVE or workbenches are most suitable when close-range inter- 
action is required. 

Figure 6.31 shows a display system for a flight simulator setup. The flight 
simulator creates a unique situation where most objects in the display are far 
away. A special optic system can be created where the distance from the 
magnifying glass to the image is equal to the focal length of the magnifying 
lens. This way the image is felt as being at infinity. Such an image is called a 
collimated image. In this scheme, the need for binocular stereoscopy is not 
significant because its effect is not significant for objects far away. 

Human Aural System 

The principle in designing (setting the configuration of) a visual display can 
be applied equally to an aural display. That is, one must have a basic 
understanding of how human aural systems work, and what important 
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PROJECTORS 



Figure 6.21. Using light polarization to produce stereo. 

right would have its polarization axis aligned vertically. The audience wears 
glasses that have a Polaroid filter for each eye. The result of this arrangement 
of projectors and filters, is that the left eye sees the movie that is projected from 
the right projector and the right eye sees the movie that is projected from the 
left projector (see Figure 6.22). This gives the viewer the perception of depth. 
Note that if only one display system is being used (for instance, rather than 
two projectors as shown in the figure), the interlacing technique can be used to 
display the right and left images simultaneously. That is, the even scan lines 
are used to render the right image and the odd for the left (see Figure 6.23). 
Note that this would reduce the image resolution in half. 

Time Multiplexing (Active Stereo) 

The next popular method of creating a nonautostereoscopic system is called 
the time multiplexed method. The methods so far produced left and right 
images that are actually rendered simultaneously and filtered for viewing. The 



Interlaced 
Format 




Figure 6.22. Taking advantage of interlacing to display the right and left image 
simultaneously. (Courtesy of Namgyu Kim.) 
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Right 




Figure 6.23. Time multiplexed stereo using a color wheel. 

time multiplexed systems switch very fast between the left and right images, 
and render only one image at a time. This system is also often called the active 
system as opposed to the polarized glass approach most referred to as the 
passive system. Figure 6.23 shows an example of a time multiplexed system 
using a color wheel that is appropriately designed and turns, in synchroniza- 
tion with the display, at a rate so that each eye only sees the appropriate image. 

A more elegant solution, but based on the same principle, is through the 
use of shutter glasses. The user wears electronic glasses that operate in 
synchronization with the display system in that when the left image is 
shown, the right part of the glass is blocked (so that only the left eye gets 
the left image) and vice versa. The glasses (or the shutter glasses) are usually 
made of an LCD so that the "blocking" can be achieved by making the LCD 
screen/window "black" by a synchronized electric signal (see Figure 6.24). 




Figure 6.24. Time multiplexed stereo display system using shutter glasses. 
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Because the switching of the image is very fast, the user is not aware that 
only one image is actually being rendered at a time. In fact, the switching 
frequency must be set at around 60 Hz for each eye, same as the Critical 
Flickering Frequency (CFF). Thus the display system must be able to support 
about 120 Hz of refresh rate to support time multiplexed stereo systems. 

Head-Mounted Display 

Finally, the third major nonautostereoscopic display system used for VR is 
the Head-Mounted Display (HMD). HMD usually employs two separate 
display devices designed to provide isolated display for each eye (see Figure 
6.27; note that there are HMDs made of only one device that extract two 
channels of display for each eye through the use of optics). Two separate 
(synchronized) images (for the right and left eye, respectively) are generated 
and fed into the respective display channel to create the stereo effect. The 
display isolation also creates the feeling of immersion aside from the stereo- 
scopic effect, another important factor in creating effective VR applications. 
Unlike large displays such as monitors or projectors, HMDs are worn on the 
head (in order to provide isolated images to the eyes). As for the display 
device itself, miniature LCDs or CRTs are used. The small images that 
appear on these LCDs or CRTs are magnified through the use of optics. 

Simple magnifiers (Figure 6.25) are problematic because the magnifying 
lens often cannot be placed close to the image to produce a wide FOV due to 
other image-enhancing modules. The eyes must be positioned quite closely at 
a certain spot from the magnifying lens in order receive a bright-enough 
image. This can cause strain on the eyes even though the exit pupil (see 
below) is quite forgiving. Instead, a compound microscope HMD design is 
often used as shown in Figure 6.26. A second lens is used to produce an 
intermediate image. This combination produces a small range of distance 
(called the exit pupil) at which the eyes can be positioned to receive most of 




Figure 6.25. A simple magnifier design for HMD Design. 
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First lens 



Figure 6.26. The compound microscope HMD design. 

the images. To be precise, the exit pupil (or eye motion box) is the area where 
the eye can be placed in order to see the full display. If the eye is outside the 
exit pupil then the full display will not be visible. The use of the second lens 
also allows for a wider realizable FOV as the second lens can be placed very 
close to the intermediate image. 

The lens close to the eye is usually adjustable so that the exit pupil can be 
adjusted to the position of the eyes for user convenience. The exit pupil 
formed in this way is usually located farther than the sweet spot of the simple 
magnifiers, thus causing less eyestrain. However, the exit pupil is not so 
forgiving and the right positioning of the eye becomes more important. On 
the other hand, to allow users with glasses to use the HMD, there must be 
enough distance between the eye and the closest optical element from the eye. 
This distance is called the eye relief. An eye relief of 25 mm is usually known 
be the minimum for use with eyeglasses. (If the HMD is focusable such that 
eyeglasses are not required, then the eye relief can be less.) Generally the 
greater the eye relief, the smaller the exit pupil will be [CRL04]. 

HMDs are nice because they provide images "isolated" from the external 
world (the eyes only see the images and not the outside world) and are often 
coupled with head-tracking to provide viewpoint-dependent display. This 
results in a much higher sense of immersion and presence. However, most 
HMDs suffer from narrow fields of view and heavy weight (not to mention 
being tethered to the computer). Combined with the fundamental problem of 
how stereo display systems cause inconsistency between the eye's accommo- 
dation and convergence, it poses a big human factors problem. Other prob- 
lems with the HMD include the imprecise location of the eye that changes 
during the period of user wearing (with respect to the exit pupil and eye relief). 
In addition, the eyes may not be positioned in the center with respect to the 
image planes (screens) nor their direction perpendicular to them. 

HMDs come in two varieties: see-through and nonsee-through. The non- 
see-through ones allow the user only to see the images on the display devices, 
however, the see-through HMDs allow users to "see through" the display 
and see the outside world (e.g., images are shown by half-mirrors). This way, 
the computer-generated images and outside real-world images can be shown 
together for augmented reality applications. See Figure 6.27. 
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Figure 6.27. Various HMDs. Left: Non-see through, Right: See through. 



Implementation 

In all major types of display systems we have described, the generation of 
stereo effect requires generation of appropriate images for the left and right 
eye (see Figure 6.28). For instance, one simple method to create a pair of left 
and right images is to take two photos or two movies in which the camera 
positions for the respective images are separated by an approximation of the 
InterPupiliary Distance (IPD) (because different people have different IPDs, 
an average value (<~6.5 cm) is normally used). Graphic rendering can be 
made based on two view volumes set up as shown in Figure 6.14, likewise 
separated by the average IPD in the x-direction of the camera coordinate 
system. As for the computer-generated images, the IPD value can be a 
variable in the system, thus customizing the view volume model according 
to the respective user. 

Other Display Systems 

There is a variety of standard display systems designed to promote higher 
immersion or customized for particular tasks. One natural variation is using 
large or multiple screens to offer a wide field of view. Figure 6.29 shows a 
system originally conceived at University of Illinois, Chicago (called the 
CAVE) in the early 1990s [Cru93]. Ideally, a small cubical room is con- 
structed with each of the sides serving as a large projection screen. The user 
is literally totally immersed in 360 degrees. Both passive and active stereo 
systems can be employed. Due to the high cost of building such systems 
(a six-sided CAVE with passive stereo would normally require 12 projectors 
and mirrors), only a few sides are sometimes used (e.g., just four for right, 
center, left, and the bottom; see Figure 6.29). 




Visual Display System (Stereoscopic) 97 

Rather than using multiple tiles of rectangular and upright screens, spe- 
cially made large spherical or cylindrical screens can be used also. Most 
often multiple projectors are used because one projector is not enough to 
cover the whole area. Because the projection surface is curved, the image is 
pre-distorted or warped so that it appears correct on the curved display 
surface. Note that as the screen gets larger, the viewing distance must 
become larger to keep the whole surface in view. Such display systems are 
often used for large-scale displays for a large group of people, for instance, in 
a theaterlike setting, in which immersive viewing is the primary purpose 
(with minimal interaction). 

Figure 6.30 shows the workbench type of display where one projection 
surface is used in a tablelike manner using a mirror underneath. Such table 
displays are very suitable for tasks such as painting, surgery, operation 



int main (int argc, char *argv[ ]) { 

/* set up views */ 

left = pfNewChan (p) ; 
right = pfNewChan (p) ; 

pf SetVec3 (hprOf f sets, -eyeAngle, O.f, O.f); 

pfSetVec3 (xyzOffsets, -Iod/2.f, O.f, O.f); 

pf ChanViewOf f sets (left, xyzOffsets, hprOffsets); 

pfSetVec3 (hprOffsets, eyeAngle, O.f, O.f); 

pfSetVec3 (xyzOffsets, Iod/2.f, O.f, O.f); 

pf ChanViewOf f sets (right, xyzOffsets, hprOffsets); 

... } 

static void DrawChannel (pfChannel *channel, void *lef t) 
{ 

/* draw and switch between views */ 
if (Shared->stereo) { 

if (* (int*) left) { 

glDrawBuffer (GL_BACK_LEFT) ; } 

else { 

glDrawBuffer (GL_BACK_RIGHT) ; } 

} 

_j 

(a) 

Figure 6.28a. Example code fragments for setting up stereoscopic display rendering 
using Performer 2 : (a) time multiplexed; 



2 Performer is a registered trademark of SGI Corporation. 
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Int main (intargc, char *argv[]) 
{ 

/* set up views */ 

for (loop=0; loop < 2; loop++) 

{ 

chan [loop] = pfNewChan (p) ; 

pfChanTravFunc ( chan [loop] , PFTRAV_DRAW, DrawChannel) ; 
pf ChanScene (chan [loop] , scene) ; 

pf ChanNear Far (chan [loop] , l.Of, 10. Of * bsphere. radius) ; 
pfChanFOV (chan [loop] , 45. Of, O.Of); 

f stats= pf GetChanFStats (chan [loop] ) ; 
pfFStatsClass (f stats, PFSTATS_ENGFX, PFSTATS_ON) ; 

} 

pfSetVec3 (xyz, 2. Of, O.Of, O.Of); 
pfChanViewOff sets (chan [0] , xyz, hpr); 
pfSetVec3 (xyz, -2. Of, O.Of, O.Of); 
pfChanViewOf f sets (chan [1] , xyz, hpr) ; 
pfChanViewport(chan[0] , 0.0, 0.5, 0.0, 1.0); 
pfChanViewport(chan[l] , 0.5, 1.0, 0.0, 1.0); 

/* display to two view channels */ 
for (loop=0; loop < 2; loop++) 

{ pfChanView (chan [loop] , view. xyz, view. hpr ) ; } 

} 

(b) 

Figure 6.28b. Continued (b) HMD. 



planning, and so on. Compared to the large-scale spherical or cylindrical 
displays, CAVE or workbenches are most suitable when close-range inter- 
action is required. 

Figure 6.31 shows a display system for a flight simulator setup. The flight 
simulator creates a unique situation where most objects in the display are far 
away. A special optic system can be created where the distance from the 
magnifying glass to the image is equal to the focal length of the magnifying 
lens. This way the image is felt as being at infinity. Such an image is called a 
collimated image. In this scheme, the need for binocular stereoscopy is not 
significant because its effect is not significant for objects far away. 

Human Aural System 

The principle in designing (setting the configuration of) a visual display can 
be applied equally to an aural display. That is, one must have a basic 
understanding of how human aural systems work, and what important 
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int main(intargc,char**argv) { ... 
glutDisplayFunc(HandleDisplay) ; 
... } 

void HandleDisplay(void) { ... 

glColorMask (GL_TROE, GL_TRUE, GL_TRUE,GL_TRUE) ; 
switch (glasses type) { 
case REDBLUE: 
case REDGREEN: 

/* Mask Definition */ 
case REDCYAN: 

glColorMask (GL_TRUE,GL_FALSE,GL_FALSE,GL_TRUE) ; 
break; 
case BLUERED: 

glColorMask (GL_FALSE,GL_FALSE,GL_TRUE,GL_TRUE) ; 
break; 
case GREENRED: 

glColorMask (GL_FALSE,GL_TRUE,GL_FALSE,GL_TRUE) ; 
break; 

case CYANRED: 

glColorMask (GL_FALSE, GL_TRUE, GL_TRUE, GL_TRUE} ; 
break; } 

/* Camera Setting Left Eye */ 
gluLookAt (camera. vp.x-right.x, 

■ vp . y-right . y, 
..vp.z-right.z, 
focus. x, focus. y, focus. z, 
camera . vu . x , camera . vu . y , camera . vu . z ) ; 
CreateWorldO ; 
glFlush ( ) ; 



/* Mask Definition */ 

glColorMask (GL_TRUE, GL_TROE, GL_TRDE, GL_TRUE) ; 

switch (glassestype) { 
case REDBLUE : 

glColorMask (GL_FALSE, GL_FALSE,GL_TRDE, GL_TRUE) ; 

case REDGREEN: 

glColorMask (GL_FALSE, GL_TRUE, GL_FALSE, GL_TRUE) ; 

case REDCYAN: 

glColorMask (GL_FALSE, GL_TRUE, GL_TRUE, GL_TRDE) ; 

case BLUERED: 
case GREENRED: 
case CYANRED: 

glColorMask (GL_TRUE, GL_FALSE, GL_FALSE, GL_TRUE) ; 

break; } 



/* Camera Setting Right Eye */ 
gluLookAt(camera.vp.x+ right. x, 

camera. vp.y+ right. y, 
camera. vp.z+ right. z, 
focus.x, focus. y, focus. z, 
camera . vu . x, camera . vu . y, 
CreateWorldO ; 
glFlush () ; 



} 



(c) 



Figure 6.28c. Continued (c) passive. (See companion CD for more details; courtesy of 
Namgyu Kim.) 



parameters there are in order to match them as closely as possible with the 
aural display system. Figure 6.32 shows the anatomy of the human ear. 

Sound waves cause the tympanic membrane (eardrum) to vibrate. The 
three bones in the ear (malleus, incus, stapes) pass these vibrations on to the 
cochlea. The cochlea is a snail-shaped, fluid-filled structure in the inner ear. 
Hair cells are located on the basilar membrane of the cochlea. When the hair 
cells are excited by vibration, a nerve impulse is generated in the auditory 
nerve. These impulses are then sent to the brain. 

The next question is how humans perceive 3D sound or directionality of 
sound. The brain uses three major properties of the sound in order to locate 
its direction of origin. At first, it used to be thought that the loudness of the 
sound (its amplitude) played the most important role in sound localization. 
However, it was found that the increase and decrease in the cycle of a single 
vibration of a sound (its phase difference) also played an important role. 
That is, the sound wave heard by the right ear will be slightly different in 
timing, compared to that heard by the left ear (akin to binocular disparity). 
This slight difference in timing (or phase) helps humans to locate the origin 
of the sound. This observation brought the invention of stereo sound (two 
speakers with a phase-controlled sound source) and surround sound systems 
(multiple speakers with a phase-controlled sound source). Note that two 
persons who hear the same sound would interpret the sound a bit differently 
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(b) (c) 



Figure 6.29. The CAVE-like display system: (a) the front view; (b) the projector and 
mirror in the back; (c) another view of the projector and mirror in the back. 
(Courtesy of Bo H. Cho.) 



as the phase differences felt by the two ears will be different due to differ- 
ences in the shapes of their ears reflecting the sound wave into their ears. 

Later, a more refined theory of 3D sound perception was found. Accord- 
ing to this theory, 3D sound perception is based on the difference in energy 
distribution in the frequency domain of the sound waves [KraOl]. This 
sparked the invention of a function (or concept) called the Head-Related 
Transfer Function (HRTF) and the HRTF-based technology that produces 
spatial sound with new energy distributions according to the (possibly 
varying) locations of the sound sources (for the right and left ear; see Figure 
6.33). To be precise, to find the sound pressure that an arbitrary source x(t) 
produces at the eardrum, we can physically measure the "impulse response," 
h(t), from the source to the eardrum. This is called the Head-Related 
Impulse Response (HRIR), and its Fourier transform H{f), is the head- 
related transfer function. The HRTF captures the energy distribution in the 
frequency domain of the sound from a particular location, and it can be 
used to reproduce or synthesize binaural signals from a monaural source (for 
the right and left ear). Note that the HRTF is a function of the location of 
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Figure 6.30. A person closely interacting with a workbench style display system. 
(Courtesy of J. Hwang.) 




Using a fresnel lens to create a collimated display 

Figure 6.31. Using a fresnel lens to create a collimated display system for a flight 
simulator. 
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Figure 6.32. Anatomy of the human ear. (Adapted from [Ber97] with permission 
from Houghton Mifflin © 2004.) 

the sound source. Thus, for a faithful reproduction of 3D sound the user 
location must be known or tracked (using separate sensors), and used as 
input to the sound synthesis system. HRTF approach would also suffer from 
the fact that each person has different ear shapes that will distort the energy 
distribution (that is, one HRTF that works for one person will ever so 
slightly not work for another). Thus, HRTF is obtained in laboratories, in 
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Figure 6.33. 3D sound systems use a frequency response function synthesized from 
the head-related transfer functions. It is used to modify the energy distribution so 
that sound from the speakers is perceived as coming from the virtual source. 
(Adapted from [KraOl] with permission from IEEE © 2004.) 
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addition for different locations, for different population and age groups so 
as to personalize the 3D sound system as much as possible [DudOO]. 

The sounds generated with the help of the HRTF would be played directly 
to the ears of the human using a headphone (two speakers are enough to 
simulate the 3D sound using HRTFs for the right and the left ears), produ- 
cing the 3D localization cue. The 3D sound quality produced from the 
general-purpose sound cards is best heard with the use of a headphone 
instead of from a set of speakers. This is because the HRTF functions used 
in the sound cards are usually sampled with microphones located in the two 
ears of a dummy head model. This in turn makes the use of multichannel 
speakers difficult because the synthesized sound is based on two channels. 
However, using headphones also creates what is known as the Inside- 
the-Head Localization (IHL) phenomenon, a false impression that a sound 
is emanating from inside the user's head [Ken95]. 

Aural Display Systems (3D Sound) 

The simplest 3D aural display would be one based on sound amplitude 
control. This would require two or more speakers and a capability to send 
synchronized sound streams with different amplitudes. The amplitudes 
would be adjusted according to the location of the sound with respect to 
the listener. For instance, the total volume can be set up to diminish linearly 
according to the distance from the sound source, and the relative volumes 
(panning) are computed from relative orientations (0) of other participants 
according to a cosine function. That is, 

RightJVolume = RightMax/2 + (cos 0)* Right Max/ '2 
LeftVolume = RightMax — RightJVolume 

where RightMax is the maximum volume for the right channel and 6 is the 
relative angle between the user and the other sound source (e.g., if there is a 
sound source at the right side, 6 is 0 degrees, if in the front, 90 degrees). 

Most sound cards are capable of producing stereo or surround sound. 
Ideally, stereo or surround sounds require sounds recorded in stereo or 
surround with two or multiple audio streams that have differences in their 
phases. Such pre-recorded stereo sound will sound differently when heard at 
different locations, when speaker locations are different from the original 
recording settings, and when heard by different people. Stereo alone thus 
provides only few directional cues. But most often, the usual sound cards do 
not provide the capability to control the phases of sound streams in an 
intricate way. In fact, to the author's knowledge, direct low-level program- 
ming of the sound cards is nearly impossible for all practical purposes (not 
open to the general public). Today's PC sound cards (and programming 
APIs) are equipped with a set of HRTF functions and a capability to deliver 
HRTF-controlled sound streams for changing locations of sound sources. 
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Most sound programming libraries offer functionalities to simply play or 
record sound bites and change some sound characteristics. DirectSound, 3 a 
part of DirectX 3 available from Microsoft, offers ways to play sounds on 
PCs (with most sound cards) with very low latency and other sound-related 
functionalities including 3D sound, multiple sound playing, sound effects 
such as the Doppler effect, echo, and even recordings. Figure 6.34 shows the 
three major sound systems for virtual reality. Figure 6.35 shows a code 
sample for setting up and delivering 3D sound using the DirectSound APIs. 

In terms of types of sounds used for display, three broad sound types can 
be identified. In the most ideal case, the sound display should be as real as 
possible employing physically based simulated sounds or something 
recorded from the real world and played back. Iconic sounds are often 
used for practical purposes instead of the real and natural "as is" sounds. 
For instance, if an environment has many doors, one might record only one 
sample of a door opening or closing sound to be used for all the doors (even 
though each door opening or closing would be slightly different in actuality). 
Finally, the simplest sound display might employ just a beeplike sound to 
indicate its presence, while not providing any cue to what it might represent. 



DirectSound and DirectX are registered trademarks of Microsoft Corporation. 
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Figure 6.34. Various sound systems: (a) stereo; (b) 5.1 surround; (c) HRTF-based 
3D sound. (Adapted from [KraOl] with permission from IEEE © 2004.) 
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/* set up sound source */ 

HRESULT C3DSound: :Setup (int env_type, WCHAR *pwsFileName, * pPerf ormance , IDirectMusicLoaderB* 
pLoader) { 

m__pLoader->LoadOb jectFromFile ( CLSID_DirectMusicSegment, IID_IDirectMusicSegment8 , pwsFileName, 
(LPVOID*) &m_pSegment) 

m_pPerformance->CreateStandardAudioPath ( DMUS_APATH_DYNAMIC_3D, 64, TRUE, 6m_p3DAudioPath ) 
m_p3DAudioPath->GetObjectInPath ( DMUS_PCHANNEL_ALL, DMU S_PATH_BUFFER, 0, GUI D_NULL , 0, 
IID_IDirectSound3DBuffer8, (LPVOID*) &m__pDSB) 

m_pPerformance->CreateStandardAudioPath ( DMUS_APATH_DYNAMIC_MONO, 64, TRUE, &m_p3DAudioPath ) 
m_p3DAudioPath->GetObjectInPath ( DMUS_PCHANNEL_ALL, DMU S_PATH_BUFFER, 0, GUI D_NULL , 0, 
IID_IDirectSoundBuf fer8, (LPVOID*) Sm_pDSB) 

} 

/* set volume */ 

void C3DSound: : setVolume (longvolume) { 

m_p3DAudioPath->SetVolume (volume, 0) ; 

I 

/*play */ 

HRESULT C3DSound: :Play(bool bLoop) { 

mipSegment->SetRepeats ( DMUS_SEG_REPEAT_INFINITE ) 

m_pPerformance->PlaySegmentEx ( m_pSegment, NULL, NULL , DMUS_SEGF_SECONDARY, 0, NULL, NULL, 
m_p3DAudioPath) 

} 

/* set position of sound source */ 

HRESULT C3DSound: : setPos (float fX, float fY, float fZ) { 
mlpDSB->SetPosition ( fX, fY, fZ, DS3D_IMMEDIATE ) 

} 

/*set position of listener */ 

HRESULT C3DSound: : setLis tenerPos (float fx, float fY, float fZ) { 
mipListener->SetPosition (fX, fY, f Z, DS3D_IMMEDIATE) ) ) 

} 

/* set sound direction */ 

HRESULT C3DSound: : setLis tenerOri (float vx, float vy, float vz, float ux, float uy, float uz) { 
m_pListener->SetOrientation (vx, vy, vz, ux, uy, uz, DS3D_IMMEDIATE) ) ) 

} 

Figure 6.35. An example of programming with DirectSound^ API. 

Haptics: Force and Tactile Feedback 

The word 'haptics' refers to the sense of touch, and can be subdivided into 
two subfields; force (kinesthetic) and tactile feedback. Force feedback dis- 
plays interact with the muscles and tendons to give the human a sensation of 
a force being applied. Humans rely on their haptics in exploring environ- 
ments in which there is poor or no visibility. For instance, pegs can be 
inserted into a hole by feeling for the surface and the chamfers into the 
hole. But in most cases, haptics work best when used with other modalities 
such as the visual and aural display. 

Most force feedback devices are in the form of robotic devices, such as 
high-Degrees Of Freedom (DOF) manipulators and exoskeleton mechan- 
isms, low DOF force feedback joysticks, or motion platforms that can 
generate and stimulate a user with the various types of forces at the point 
of interaction (called the virtual proxy). Tactile feedback is explained in the 
next section. 

Haptic devices can also be categorized as active and passive. Active 
devices generate force feedback to be exerted on the human user (e.g., a 
manipulator or motion platform). Passive devices provide haptic cues solely 
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Fold/Unfold 



Power on/off 



Figure 6.36. Example of use of passive haptics (props): (a) the rectangular prop 
represents the mobile phone. The prop has 10 yellow markers for vision-based 
registration purpose and two wireless switches for interaction; (b) what is seen by 
the user through the head-mounted display [LeeSY04]. (Reprinted with permission 
from IEEE © 2004.) 

by their physical existence (see Figure 6.36). That is, humans can obtain 
"passive" feedback by grabbing or colliding against an inactive object such 
as in using props and real-life objects as interaction objects. Passive haptics 
can be an attractive option sometimes, because active haptics employ ex- 
pensive devices that can clutter the visual display, plus it is relatively safer 
(no active parts), and easy (metaphorically shaped) to use. Most haptic 
devices only provide very limited force feedback such as point force feedback 
(e.g., through the end effector of the manipulator), or low degrees of free- 
dom (no rotational force). To deliver a large amount of force, the device 
must carry sufficient mass, which makes the device less safe and more 
cumbersome to use (less mobile, obstructs visual displays). Thus, it is diffi- 
cult to project if there will ever be a "natural" haptic system as envisioned 
and portrayed in science fiction (unless we can find a way to directly 
stimulate and control the sensorimotor system of the brain). To summarize, 
haptics can find good use in a limited manner for very specific tasks in which 
force feedback is critical for efficient task completion or virtual experience 
[Smi04]. 

Haptic Display and Implementing Haptics 

As with any display system, to correctly design a haptic interface the anat- 
omy and physiology of the body must be taken into consideration. The 
details of such human factors issues and design of ergonomic mechanisms 
are beyond the scope of this book, and there has only been little work in the 
ergonomic design of haptic devices. For instance, Figure 6.37a shows 
the ground-based manipulator type force feedback device. Ground-based 
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(a) (b) 

Figure 6.37. Various haptic displays: (a) grounded; (b) ungrounded. (Courtesy of 
Kwang H. Ko and Luke Shih.) 

devices are solidly connected to the ground, where as ungrounded devices 
are usually worn on the body like an exoskeleton (see Figure 6.37b). These 
are basically robotic devices that are actuated by electric motors or hy- 
draulic/pneumatic pumps. The important parameters to consider as the 
user of the technology would be the output strength of the devices in relation 
to the capabilities of the human joints. According to Kilchenman [KilOO], for 
instance, a force output of 3 to 4 N is needed for size discrimination, 
identification, and object detection. 

The degree of freedom is another important parameter. Some robotic 
manipulators can only deliver three degrees of freedom (for cost reasons) 
where as human hands and arms can move in more than six degrees of 
freedom, thus limiting the movement of the human user and creating dis- 
comfort. The output strength of robotic devices is related to their mass, and 
usually the greater the mass, the greater the size. Large-sized haptic devices 
either clutter the visual display or are heavy to use if worn on the body. Light 
and small-sized haptic devices can break easily. Note that haptic devices are 
usually equipped with sensors (e.g., to measure the movements of the haptic 
device and thus indirectly measure motion of the human action) so that their 
values can be used as input. The sensors usually sense the linear and/or 
rotational position, velocity, and even acceleration, and their values are used 
to compute the correctional force to be applied in the next instant. In fact, 
generating a "stable" 4 force requires a closed feedback loop that runs at a 
very high rate (up to 1000 Hz) for 'rendering the force' (i.e., sensing the user 



4 According to the system control theory, without sufficient update rate, response 
force can exhibit oscillation or jitter, or in the worst case, divergence. 
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motion, simulating/updating the scene (e.g., the object in contact), and 
generating the force response (the graphic rendering may run at lower 
rate)). However, it has been reported that a feedback loop operating down 
to 40 ~ 80 Hz is good enough for simple tasks such as size identification and 
discrimination, and round corner detection [KilOO]. 

There are other haptic devices that do not resemble robotic manipulators 
or exoskeletons, but they mostly operate on similar principles: they use 
various actuators in different degrees and directions of freedom in conjunc- 
tion with sensors to generate and regulate the force output. Examples 
include the force feedback joystick, motion platforms, force-controlled bi- 
cycles and stepper machines, and the like. Recently, game pads came out 
equipped with simple force feedback actuators using electronically con- 
trolled mechanical relays that provide a very simple and crude sensation of 
inertia at the time of (virtual) contact. 

Computing for the force feedback at the point of contact or interaction (or 
virtual proxy) is referred to as haptic rendering. As mentioned above, haptic 
rendering involves checking for existence and location/direction of collision 
(as a separate process; see Chapter 9), reading the current values of the 
sensors, and computing the value of the force feedback to be displayed at 
the virtual proxy as a response to the collision. Such collision responses 
may be based on physically based formulations considering the dynamics 
and kinematic properties of the interaction object and the virtual proxy (e.g., 
the human hand). In many cases, simplistic models are used, however, 
to reduce the amount of computation (note that this computation must be 
done ideally at a rate of 1000 Hz which is much higher than the visual 
update rate of ^25 Hz). One of the simplest models calculates the reactive 
force as a value proportional to the collision velocity with its direction 
directly opposite to the colliding direction. Figure 6.38 illustrates the 
model. Chapter 10 covers some of the basics of physical simulation and 
generating collision response effects. 



Stimulation of Other Modalities I 

Tactile Feedback 

Sensation of touch (e.g., texture of objects) can be important for certain 
explorative manipulation tasks such as medical palpation, where physicians 
locate hidden anatomical structures and evaluate tissue properties using 
their hands [How02]. Tactile display devices stimulate the user's skin (usu- 
ally constrained to be the fingertip) to generate these sensations of various 
types of contact. The skin responds to several distributed physical quantities; 
the most important are perhaps high-frequency vibrations (that can be 
provided using, e.g., small vibratory motors, piezo-electric materials; see 
Figure 6.39), small-scale shape or pressure distribution (can be provided 
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Figure 6.38. A simple haptic rendering procedure. 



e.g., by using an array of small moving pins), and thermal properties 
[How02]. 

However, most tactile display systems are still in the research stage, and 
cannot be used effectively for a VR system yet. The devices, similarly to 
haptic devices, tend to be large (and thus immobile) in their sizes and only 
provide sensations to a very small area (fingertip). A number of companies 
have come out with a mouse that can provide a limited amount of tactile 




Figure 6.39. Vibratory tactile feedback device [Jan02]. 
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sensation [Vir04] and SONY is working on computer screens (LCD) that can 
provide tactile sensations using special solid-state devices [Pou02]. 

Although most research on tactile feedback systems has focused on tech- 
niques to exactly recreate the texture of virtual surfaces using special devices 
and materials, there also have been proposals to use tactile feedback (usually 
using vibration) as an abstract information channel in several application 
contexts. Typically, to effectively convey certain information, such tactile 
feedback systems are applied to a larger skin area (e.g., abdominal region), 
and thus are often in a wearable form. The U.S. Navy has developed a 
system called the TSAS (Tactile Situation Awareness System), a tactile vest 
to help pilots' situation awareness in aerial navigation and combat [TatOO]. 

Proprioception 

Proprioception refers to sensations obtained from receptors in our muscles, 
tendons, and joints. With signals from these organs, we, for instance, are 
aware of the positions of our arms and legs even with our eyes closed. In 
addition to positions, we can accurately sense the speeds, directions of 
movements, and forces of our limbs as well. The sense of proprioception 
creates a body image, a vivid sense of different body parts occupying space 
[BerOO]. Proprioceptive feedback is important for enacting natural inter- 
action and task performance/learning (because we usually move our body 
parts to interact and carry out tasks), and spatial perception. When we turn 
our heads, we use the proprioceptive feedback from the neck joint to encode 
the space around us. However, there is no external device that can directly 
stimulate our proprioceptive receptors (they would have to be connected 
directly to our nerves). Instead, if possible, a VR system should employ 
head-tracking and view-dependent scenes in as many directions as possible 
and also strive to implement whole-body interaction that utilizes as many 
body limbs as possible for close-range 3D interaction. This helps the user to 
build a better perception of space and feel the higher presence in the virtual 
environment. 

Vestibular Sense 

The vestibular sense is the sense of acceleration and balance. The vestibular 
organ is the semicircular canal located within the inner ear which is filled 
with fluid that moves under acceleration or deceleration (but not under 
constant velocity). That fluid movement provides the cues for perceptions 
about gravity, acceleration, and balance. Similarly to proprioception, it is 
difficult to provide the right vestibular sense externally consistent with what 
happens in the virtual environment (one has actually to run, walk, make 
sudden moves, or tilt one's body on one's own). Rather, the fact that 
humans possess this vestibular sense is a source of discomfort. Similarly to 
the phenomenon where humans feel discomfort from the inconsistency 



Stimulation of Other Modalities I 



111 



between convergence and accommodation when looking at (artificial) 
stereoscopic images, the coupling between the vestibular and visual sense is 
very strong. One of the typical sicknesses induced in a VR setting often 
happens in virtual navigation where the user receives cues of moving 
through the virtual environment through the visual channel, whereas in 
reality, one is standing still in the physical real world (thus having no 
vestibular sense of moving). Such inconsistency creates sickness and discom- 
fort that are very difficult to get around. Ironically, making the VR system 
induce higher presence (e.g., by using more immersive displays, rich inter- 
action, and so forth) is known to make this problem worse. On the other 
hand, it is possible to reduce the effect of such sicknesses by making the user 
pay attention to certain tasks or to a storyline (although it is not clear what 
the after-effects might be). Motion platforms that generate sudden acceler- 
ated motions can deliver limited stimulation to the vestibular sense and add 
to the creation of compelling virtual navigation. 

Stimulation of Other Modalities II: Olfactory, Wind, 
Thermal, Taste 

No practical sensory display systems or devices exist for simulating the 
effects of smell, wind, taste, and other more "exotic" human sensory mo- 
dalities. For instance, very little work, commercial or research, has been 
done on olfactory display systems. Unlike color, it is not possible to com- 
pose a new scent from the basic component scents and once a scent is 
diffused into air, a problem of quickly removing the residuals remains. 
Researchers at the University of Central Florida [Mor03] are working on a 
scent-specific technology with objectives to deliver a specific odor at a 
specific time. Their system, the "ScentKiosk" is PC-controlled and provides 
three different scents in small amounts near the user's nose for quick detec- 
tion without residual room odors (see Figure 6.40). Another system by the 
same group, called the "ScentDome" can provide 20 scents dispersed by a 
small fan. Researchers at the Institute of Creative Technologies (ICT) are 
working on a concept the "Scent Release Necklace" (see Figure 6.41) that 
uses several scent cartridges controlled by a wireless interface. At the recent 
IEEE Virtual Reality conference, Yanagida et al. presented a novel olfactory 
display system called the 'Air Cannon'. The Air Cannon, detached from the 
user by some distance, tracks the location of the nose by a camera, and 
"shoots" a portion of a small scent packet using an aerodynamic pump 
system so that the portion arrives near the nose of the user [Yan04]. 

The sensing of the wind or air flow can be an important presence- 
promoting factor by providing a cue or medium that bridges the virtual 
and physical worlds. Moon et al. have developed an interface that simulates 
the effect of wind using 16 small computer-controlled fans [Moo04]. They 
reported that the interface along with the visual feedback improved the user- 
felt presence significantly (see Figure 6.42). 
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Figure 6.40. The ScentKiosk that provides several odors directly to the user's nose. 
(Reprinted with permission from the ScentAir Technologies © 2004.) 




Figure 6.41. The Scent Collar 5 under development by ICT [Mor03]. (Reprinted with 
courtesy and permission from J. Morrie and the ICT © 2004.) 



5 The Scent Collar is a development effort between USC's Institute for Creative 
Technologies and Anthrotronix, Inc. The Scent Collar development is sponsored by 
the U.S. Army Research, Development, and Engineering Command (RDECOM); 
however, the content of the information pertaining to the collar does not necessarily 
reflect the position or the policy of the government, and no official endorsement 
should be inferred. 
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Figure 6.42. The wind interface developed by Moon et al. [Moo04]. (Reprinted with 
permission from the ACM © 2004.) 



Although work is in progress, no "practical" displays for simulating the 
effects of smell, wind, taste, and other "exotic" modalities that humans 
possess exist at the time of writing this book. However, ingenuity can go a 
long way to overcome the problem sometimes. Sensorama, perhaps the very 
first virtual reality system, created by M. Heilig in the late 1950s, included a 
fan device to simulate the wind drag effect of motorcycle ride and a one-shot 
simple device to reproduce the foul smell of New York City alleyways 
[Sen61]. Dinh et al. studied the effects of multimodality on the participants' 
sense of presence in a virtual environment and on their memory for the 
environment and the objects in that environment. In their study, they used a 
coffeemaker to produce the effect of the smell, a small electric fan for the 
effect of the wind, and a high-energy light bulb for the thermal effect 
[Din99]. 

Summary 

Multimodal interaction is one of the defining characteristics of a true VR 
system. To design an effective multimodal display, once must first under- 
stand how the human perceptual system works, and try to match the display 
capabilities to those of the human. In addition, the type of the task to be 
carried out must be considered as well. Although humans possess five major 
senses, it is the "big three" modalities of the visual, aural, and haptic that are 
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taken advantage of in today's usual VR system. Although display systems 
for other modalities are still in the research stages, special-purpose devices 
can sometimes accomplish the desired effect. 



Pondering Points 

• Imagine a projective display system of 2 m x 3 m with the user 2 m away 
from the middle of the screen. Assuming 20/20 vision of the user, calculate 
the required resolution, FOV, and pitch size. 

• Take the HMD in your lab (or look one up on the Web), and jot down the 
important parameter values such as the FOV, resolution, exit pupil, and 
eye relief. 

• Most HMDs today suffer from a narrow field of view. Suggest a way to 
overcome this problem. 



Chapter 7 

Sensors and Input Processing 



The last chapter examined various display systems for different human 
sensory modalities used as output devices in virtual reality systems. In this 
chapter, we examine typical sensors used as input devices. Like the display 
systems, using or wearing sensors raises human factors issues and it is 
important to understand their capabilities and working principles to design 
the right interface (see Chapter 8 also). 

The sensors can be largely divided into three styles: continuous, discrete, 
and combined [Bow05]. Continuous input devices are hardware devices that 
sample certain physical properties or quantities of the real world such as a 
position, orientation, acceleration, velocity, pressure, and so on. The discrete 
input hardware devices generate one event at a time upon the user's desig- 
nated action, such as pressing a button or making a pinch action. Continu- 
ous input devices are usually used in combination with discrete input devices 
(as with a mouse), and in conjunction with event-generating recognition 
software (such as recognition of gestures, voice commands, body move- 
ments, etc.). 

In order to reflect the user's intention back to the virtual world as soon as 
possible, the sensors must have as low latency as possible. Latency refers to 
the time from capturing certain data and delivering them to the system. This 
includes the time of capture, certain processing (which may include the 
recognition process if required), and time for delivery to the system. Usually, 
the bottleneck is on the processing part, especially if it has to be done in 
software (e.g., format conversion, recognition into a discrete meaningful 
event). Apart from latency, a sensor's capability will be bounded by its 
update (or sampling) rate. This is the rate at which the sensors "sample" 
the world to produce the sensor data. The higher the update rate, the better 
temporal resolution of the world data will be obtained. However, if the 
latency is too large, the high update rate would be of little value. Too 
much latency or very low update rate can introduce noticeable lag in using 
virtual reality systems. In addition, sensors can introduce sizable inherent 
error and distortion and require a calibration process to minimize such 
effects. 
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Sensors must also be carefully 'registered' into the virtual world, meaning 
that the relationship between their coordinate systems must be established 
correctly. Matching this unitless quantity to the real requires a mapping 
process. That is, depending on the display devices used, when moving by 100 
in virtual space, how much it amounts to the real-world quantity will have to 
be measured separately to establish such mapping and this mapping will be 
different for different display systems and sensors. 

Trackers 

The most important (continuous) input device used for virtual reality sys- 
tems is the tracker that senses and tracks a designated position or orientation 
in the 3D space. Trackers are important because tracking 3D position and 
orientation is essential in realizing natural interaction. For desktop systems, 
we use the mouse to track positions in the (limited) 2D space. Trackers come 
in many different flavors according to how they work (which is related to the 
accuracy and amount of possible distortion), whether they are wired, degrees 
of freedom, and operating range. Table 7.1 summarizes the various types of 
trackers and categorizes them according to these characteristics. 

Magnetic trackers are composed of a source that emits a low-frequency 
magnetic field and sensors that determine their position and orientation 
relative to the magnetic field. They are relatively inexpensive with reasonable 
operating range and accuracy, but suffer from significant distortion with 
metal objects in the environment. Acoustic trackers use sound waves and 
their travel distance in unit time to triangulate the position and orientation 
of the sensor. Due to its operating principle, the line of sight between the 
sensor and the sound wave source must be clear. Acoustic trackers are 
inexpensive but usually have low accuracy and limited range. 

Mechanical trackers rely on sensing the joint movements of mechanisms, 
such as in a manipulatorlike robot, and thus are highly accurate. Depending 
on the mechanism used, it can be difficult to control. For instance, the 
Spaceball 1 and the Magellan 2 (see Figure 7.1) are what are called isometric 
devices and may require a relatively large amount of force of twisting or 
pushing to move a small distance. Inertial trackers are based on computing 
(integrating) distances or orientation traveled (from a known reference 
point) with acceleration values obtained from gyros or accelerometers. 
Due to the nature of integration, after some time of operation, errors start 
to accumulate (drift errors) and the system must be reset again. 

Most devices described above must tethered by nature, or making them 
untethered is an expensive option, however, vision-based tracking offers an 
inexpensive option with wireless convenience. Vision-based tracking used to 



1 Spaceball is a registered trademark of 3DConnexion, Inc. 

2 Magellan is a registered trademark of Logitech Corporation. 
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Figure 7.1. Various tracking sensors: (a) magnetic trackers, (b) ultrasonic 3D mouse; 
(c) Magellan; (d) Spaceball 3D mouse (cursor position moves in six degrees of 
freedom by controlling the isotonic ball; (e) finger trackers (glove); and (f) vision- 
based trackers (four cameras tracking a marker worn by the user). (Pictures: Cour- 
tesy of Bo H. Cho.) 



be impractical due to the high computational load, but it is becoming 
popular because of the increased capabilities and reduced costs of the PC 
and associated hardware such as digital signal processing and even graphics 
boards. Vision-based tracking still has relatively low accuracy, unless mark- 
ers or a known static background is used. There are also special-purpose 
devices for tracking gaze, fingers (e.g., glovelike device), body postures, and 
human limbs. 



Event Generators 

There is a variety of discrete event generators used for virtual reality systems. 
From the viewpoint of interaction, we describe them by the parts of the body 
used to initiate the events. The most typical interaction is carried out 
through the hand. Button devices (typically mounted on trackers as "hy- 
brid" devices) are the most common hand- or finger-activated event gener- 
ators (see Figure 7.2). Another possibility is to use pressure sensors mounted 
on fingertips on gloves, and use the finger pinch actions to generate many 
different events. Hand (motion) gestures are also used often. However, 
recognition of hand gestures is generally difficult because hand/finger pos- 
ture/movements need to be tracked either by vision-based techniques which 
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(a) 



(b) 



Figure 7.2. Hand/finger activated (hybrid) button devices: (a) 3D mouse; (b) pinch 
glove. 3 

usually require a certain background (e.g., static background scene) or high 
processing power, or by mechanical sensors that are difficult to use ergo- 
nomically. Furthermore, this is all on top of the difficulty in unambiguous 
recognition of gestures from continuous finger or joint data or hand pos- 
ition. This problem is further exacerbated by the need to segmentize out the 
data that correspond to the gesture and those that do not (i.e., where does 
the gesture start and end, and which portion is the neutral/meaningless 
pose?). 

The foot has been used primarily for interaction control for navigation. 
Custom-made buttons or pressure-based sensors mounted on mats or floors 
(or even stepper machines) have been used to detect footsteps and are 
interpreted for navigation control (e.g., direction and speed; see Figure 
7.3). These devices can be combined with rotational tables to add orientation 
and direction control as well. 

Voice and speech recognition represents another natural method for 
interaction in virtual reality systems as humans do use speech every day in 
conjunction with other modalities. The technology is advancing rapidly, but 
voice and speech recognition is only at a level to be used for isolated word 
recognition. It is still speaker-dependent (requires training) and suffers from 
low recognition rate if there is significant ambient noise (might require 
special microphones). To avoid recognition errors, contextual information 
can help (e.g., the "up" command is only recognized when the menu is 
present). Because most recognition algorithms are based on statistical simi- 
larity, it is recommended to use short keywords that are as distinct from each 
other as possible (e.g., "right" vs. "light"). In fact, most speech interfaces are 
used for keyword recognition with fairly low word counts. Even so, using 
voice/speech recognition with other types of input method has been demon- 
strated to be very useful [Bol80]. 



3 Pinch Glove is a registered trademark of Fakespace Corporation. 
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Figure 7.3. The "Walk-in-Place" metaphor used for navigation interface. The walk- 
ing gesture of the user is recognized to control navigation in the virtual environment. 
The enactment of the walking enhances the user-felt presence [Sla95]. (Courtesy of 
Seok H. Jeon.) 



Sensor Errors and Calibration 

VR systems employ a number of input sensors to realize 3D multimodal 
interfaces as described in this chapter (see Figure 7.4). Among them (as is the 
mouse for the 2D desktop interface) 3D trackers play a very important role 
in VR systems. However, due to the various operating principles and exter- 
nal conditions, they exhibit a large amount of error that results in an 
incorrect reflection of user input and distorted output, thus make it difficult 
for the user to accomplish a given task and also cause user discomfort. For 
instance, magnetic trackers are one of the most common VR devices for their 
wide operating range, high degrees of freedom (6D), low latency, and 
relatively low cost. However, they suffer from quite a significant output 
distortion when there are interferences from common electromagnetic de- 
vices in the environment such as monitors, power cables, metal objects, and 
the like. Moreover, the underlying working principles make the errors more 
apparent as the sensors are farther away from the magnetic sensor source. 
One of the ways to battle such intrinsic sensor errors is to calibrate them, 
correcting the sensor output by adjusting it to match or conform to a 
dependably known and unvarying measure. That is, we sample the output 
values, throughout the operating space, at known locations and orientations 
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Figure 7.4. Setting up coordinate systems in virtual space with (tracking) sensors. 

and record the error values to build a large table of error values indexed by 
the designated positions and orientations. From the table, we can estimate 
the relationship between the positions/orientations and the error values 
using a high-order polynomial function (usually second or third), and the 
function enables us to estimate the errors at positions and orientations that 
were not sampled in the first place, and correct the raw data by software. 
However, such a calibration process requires tedious data collection, and 
whenever the environment changes it must be carried out again. 

Summary 

Sensors are equally important as display systems: they allow the user to 
convey her intention to control and access the virtual world. After all, 
interaction is a two-way street. To effectively use sensors, their operating 
principles and working characteristics must be understood. A calibration is 
often needed to compensate for their inherent errors and distortion. 

Pondering Points 

• Imagine that you are building a virtual running machine. Describe the 
most effective and natural sensor system to acquire various possible types 
of user intentions and actions. 

• As in desktop environments, menus are often used in virtual environ- 
ments. Describe various ways to use the menu system (e.g., pull-up 
menu, select and confirm, close menu). What kind of sensors or input 
methods would you use? What kind of feedback (upon input) would you 
use? Explain your answer. 



Chapter 8 

3D Multimodal Interaction Design 



Why Go 3D Multimodal? 

One of the goals and ways to realize virtual reality is through using 3D 
multimodal interfaces. Unlike the usual desktop interaction in which we use 
the mouse and keyboard to click and type on a small 2D canvas to carry out 
various tasks, three dimensions and multimodality are important for virtual 
reality because humans do live and operate in a three-dimensional world 
employing various sensory and motor organs. Thus, it is fair to assume that 
3D multimodal interfaces will be natural for human users for many tasks, as 
they leverage the motor and sensory skills that we use every day. 

This does not mean that 3D multimodal interfaces will always be "better" 
than the traditional desktop 2D interface. Equally, there might be tasks 
that are best accomplished in a seemingly "unnatural" way. The way hu- 
mans carry out tasks in the real world may be assumed to be natural, but 
bounded by various physical constraints. In the virtual world, where oper- 
ating constraints are different, the physical constraints of the real world 
may be nullified to some extent. Only human ergonomics would constrain 
the interaction design. That is, it may be possible, for certain tasks, to 
devise "magical" interaction methods that are only possible in the virtual 
world, yet more efficient. Humans, with their great adaptive capability 
can often quickly learn such new interfaces. Thus, naturalness is not a 
necessary condition for interaction efficiency. However, natural and ergo- 
nomic (suited to human evolution) interfaces are generally easy to use and 
learn. 

It has also been suggested that natural or ergonomically designed inter- 
actions contribute to a higher sense of presence. Badly designed interaction 
models and interfaces cause distractions and fatigue to the user, and lower 
the sense of presence. 3D multimodal interaction design is further compli- 
cated by the fact that the devices and computational resources required by 
the nature of the given task are not usually immediately available and 
"satisificing" (satisfy + sacrifice) solutions must be often made (e.g., high 
efficiency but low sense of presence or vice versa). 
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Structured Approach to the Interaction / Interface Design 

In discussing 3D multimodal interaction, we first make an important dis- 
tinction between two terminologies: interaction and interface. Interaction is 
an abstract model that describes how a user accomplishes some tasks. 
Interface is a specialized choice of hardware and software through which a 
user communicates with the computer system (a particular implementation 
of the interaction model; see Figure 8.1). 

The key to a good interaction design is to carefully model the given task, 
and propose a set of interfaces that satisftce the multidimensional criteria of 
presence, naturalness, and efficiency. As with any design, interaction design 
also goes through the iterative phases of synthesis/modification and evalu- 
ation, for the lack of established design methodology (compared to the 
maturity for the 2D interface counterpart). This is partly because the sensor 
and display technologies (and their cost) are continually changing and the 
goals of interaction design are often conflicting. Moreover, the evaluation 
criteria are often loosely defined. Although for interaction efficiency, quan- 
titative measures of task performance such as the completion time and error 
rate can be used, presence, user preference, and naturalness are quite sub- 
jective, and correct evaluation of interaction design with respect to these 
subjective criteria would ideally require a usability experiment with a large 
number of subjects. This is often practically an impossible thing to do in an 
iterative design process. There is no complete established methodology in 3D 
multimodal interaction design, however, a few guidelines from prior and 
ongoing research do exist [Bow05] and they can be applied effectively to 
reduce the trial-and-error cycles and overcome the dependence on sole 
experience. 

Although the basic goal of any interaction design is to have the user 
accomplish a certain task, the designer must prioritize the possibly conflicting 
goals such as task performance efficiency, presence, ease of use, learnability, 




Figure 8.1. Interaction model and interface implementation. 
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and so on. With this in mind, the first guideline in interaction design is to carry 
out an analysis of the interactive task. This involves first identifying the user 
and considering the human factor requirements. 

For instance, items of consideration can include the age, gender, experi- 
ence with computers and games, eye strength, and so on. The heart of task 
analysis is the decomposition of the high-level task into smaller ones. When 
faced with a multidimensional task, such as moving an object in three- 
dimensional space, studies at the University of North Carolina at Chapel 
Hill have shown that users usually break the task into a series of lower- 
dimensional (1 or 2D) problems [Bro77]. For instance, the task can be 
characterized as moving the object in the xj-plane before moving it into its 
final position by moving in the z-direction. Tasks may simply be decom- 
posed into subtasks based on their characteristics. Certain tasks may need to 
be carried out one after another, and some may be done concurrently. There 
exist formal approaches to task analysis via decomposition such as the 
GOMS approach that is based on the human problem-solving model 
[Car80]. The task decomposition produces a hierarchy of tasks and sub- 
tasks as shown in Figure 8.2. The subtasks at the bottom of the hierarchy 
that cannot be decomposed any further are called primitive tasks. Typical 
primitive tasks include those such as object selection, object manipulation, 
navigation, and system control [Bow05]. In many cases, the high-level tasks 
can be designed as the collections and combinations of these primitive 
tasks (composite tasks). Figure 8.3 shows a specific example of a task 
hierarchy. 



Task 




Subtask 1 .2.3.4 



Parallel Tasks 
Sequential Tasks 



Metaphor 



Specific Implementation 
(Interface) 



Figure 8.2. Task analysis for interaction/interface design. 
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operate a virtual ship simulator 





control speed 




control direction 




control view 






















select 
an engine lever 




manipulate 
an engine lever 
up and down 




select 
a steering wheel 




turn 

the steering wheel 
left/right 





Parallel Tasks 

»- Sequential Tasks 

Figure 8.3. An example of a task hierarchy (the Ship Simulator case). 



Metaphors 

Once the primitive tasks are identified, an interface can be designed with a 
specific choice of hardware devices and software components (e.g., graphic 
feedback, sound effects, speech recognition, etc.). Metaphors are often used 
in designing an interface for a given task. Metaphors, in the context of 
human-computer interaction, are entities that are deliberately designed to 
be easily manipulable (for its familiarity, concreteness, down-to-earthness, 
abstraction, etc.) for a more intuitive control of a certain task. Metaphors we 
use every day are mostly visual (e.g., icons), although other modalities such 
as textual, aural, and even haptic would be possible. Metaphors help users 
build a mental model of computer systems by tapping into the knowledge 
about a familiar domain that is mapped on the unfamiliar domain or task. 
The successful use of metaphors hinges upon the degrees of matching user 
expectation to what the interface object should and should not do [Pre94]. 
Note that metaphors can be implemented in both hardware and software. 
Figure 8.4 shows several metaphors designed for the primitive task of 
navigation. In fact, many different metaphors and interface designs have 
been proposed for the four representative primitive tasks (i.e., selection, 
manipulation, navigation, and system control) over the years [Bow05]. 
Many of them have also been tested for their usability as well. We give an 
overview of them in the next section. However, before we do so, we discuss 
various issues in integrating different modalities for 3D and natural 
interaction. 
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(c) (d) 

Figure 8.4. Various metaphors for the navigation task: (a) world-in-miniature: 
navigate by moving in the mini-map [Sto95] (Reprinted with permission from 
ACM © 2004.); (b) shiplike navigation; (c) navigation by natural motion (running); 
(d) navigation with bicycling action [KwoOl], (Reprinted with permission from IEEE 
(c) 2004.) 

Interface Design 

The first thing to remember in interaction/interface design is that interaction 
is a closed-loop task with the human user as a part of it. There are aspects of 
both input and output. The input is driven and guided by what is displayed 
to the user, and the display is affected by the user input. The parameters that 
the user controls would be determined through the task analysis and the 
most ergonomic hardware/software choice must be made with the given 
resource. The user-controlled parameter must be reflected "immediately" 
as a matched display output in the most intuitive and natural fashion. Again, 
what is intuitive and natural depends on the task at hand and although there 
do exist established methods for certain tasks, this aspect is subject to 
running experiments and going through some trial-and-error processes. 



Interface Design 127 



mouse down 




mouse up 



Figure 8.5. The Arcball metaphor for object rotation [Sho92]. 



For instance, rotating objects as part of a manipulation task can be 
metaphorically represented through a virtual ball. When attempting to 
rotate a selected object, a spherelike semi-transparent template would be 
superimposed on the target object, and by rotating the sphere, the target 
object would rotate accordingly. The Arcball [Sho92] is agreeably a natural 
metaphor for representing the act of rotation, and is simple enough (com- 
putationally) to be implemented so that the applied rotation is immediately 
(unnoticeable delay) reflected upon the target object (see Figure 8.5). The 
relationship between the applied rotation to the amount actually reflected to 
the target object can be adjusted, but it should be kept within reasonable 
bounds to be intuitive and natural. 

One of the aspects of usability is the human factor. As explained in 
Chapter 6, the interfaces that we design must conform to our sensory and 
motor organs as much as possible. For instance, we must consider visual, 
aural, or kinesthetic (or any other modal) capacity of humans such as the 
effects and fatigue from using stereoscopic displays, the minimum or suffi- 
cient image resolution (in both lateral direction and depth), maximum 
tolerable audio intensity or quality, the minimum update rates for haptic 
rendering, tolerance for flickering, and so on. 

Although it is beyond the scope of this book to fully investigate the 
process involved in designing ergonomic tools and devices for humans, one 
ergonomic principle, called Fitt's law [Fit54], is important to remember. 
Fitt's law was originally formulated in the context of real- world use, but 
should be applicable for operations in the virtual world as well. Fitt's law 
states that the time to reach (which is related to the overall task perform- 
ance) a target object, whose width is W and is at distance A from the user, is 
logarithmically related to A/W(see Figure 8.6). Thus, this law can be used to 
place and properly size virtual tools, 3D widgets, and menus with respect 
to users. 
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W 



T=KlogA/W 



Object 




A 



User 



Figure 8.6. Fitt's law: the task performance time is related to the width of the 
interaction object and the interaction distance in a logarithmic way [Fit54]. 



The information content in the display must be informative so that the 
interface is easy to use and drives the user to make as few mistakes as 
possible (e.g., by constraining the choices that can be made or by providing 
rich environment information, immediacy, and intuitive correspondence). 
For instance, in the Arcball example, we can augment the Arcball with 
graphics or text indicating the current rotation direction, or an ability to 
freeze certain rotational degrees of freedom. Another example is in the use of 
gestures for input. Many studies have concentrated on gesture recognition 
itself, rather than on the design of gestures. Many gestures do not have any 
relevance to the task at hand (no metaphorical connection). A typical case 
might be using various hand gestures or pinch codes to make system com- 
mands or invoke tasks. It is recommended that "natural" gestures be used 
whose motion profile or gesture configuration is abstracted from the actual 
in the geometric sense [ChuOl]. Obscure gestures are difficult to recall, 
similar to voice/speech recognition systems with many keywords to remem- 
ber. Making an interface amounts to reducing the cognitive load put on the 
user. This brings back the point that the type of user must be considered, for 
instance, in terms of cognitive maturity and capability. The "magic number" 
of 7 is often referred to as the approximate limit for our short-term memory 
[Mil56]. 

Multimodality 

The whole premise behind multimodality is that by adding an "independ- 
ent" input channel, the amount of information that is processed by the brain 
is increased. The increase in information reduces the error and time taken to 
complete a task. It also reduces the energy consumption and the magnitudes 
of contact forces used in a teleoperation situation [Smi04]. 

Physiologically, as emphasized throughout Chapter 5, the most usable 
(within the resource constraint) input/output device that matches our sens- 
ory capabilities must be used. However, the most important issue pertinent 
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to multimodality is the issue of consistency. In fact, note that multimodal 
interfaces can be either simultaneous or sequential [Ovi03]. In the simultan- 
eous interface, different modalities would be used as input or output 
methods at the same time. In the sequential one, different modalities are 
used one after another. In the former, when more than two modalities 
are used at the same time, maintaining the consistency among them is very 
important. We have already described the fundamental problem with the 
stereoscopic display that has to do with the inconsistency between accom- 
modation and convergence cues that cause discomfort and sickness. The 
same goes for any multimodal input/output combination. For instance, it 
has been reported that humans will perceive a graphic and audio event to be 
different if separated at about 180 msec [Min98]. Humans are particularly 
sensitive to the discrepancy between the visual and vestibular sense. Virtual 
navigation often suffers from this problem, because visual feedback creates 
the sense of movement while not actually moving (called the vection). This is 
fundamentally difficult to overcome due to the very objective of virtual 
navigation. Incorporating an attentive task is believed to lower the sickness 
caused by such multimodal inconsistencies. 

It is sometimes difficult to achieve complete consistency among the differ- 
ent modalities, however, the use of multimodal interaction itself can enrich 
the virtual experience and even create synergistic effects by one modality 
compensating for the others. It is generally accepted that multisensory feed- 
back is beneficial to both presence and task performance in the context of 
virtual reality systems [Kam02; SalOl]. This is only true provided that the 
feedback from each modality is consistent with another [Ovi03], and the 
multisensory feedback (or input) is configured appropriately for the task at 
hand [Ovi99]. The modality appropriateness hypothesis postulates that the 
modality that is most appropriate or reliable with respect to a given task is the 
modality that dominates the perception in the context of that task [ShiOl]. 
Vision has higher spatial resolution, hence its dominance in spatial tasks, and 
audition has a higher temporal resolution, hence its dominance in temporal 
tasks [ShiOl]. The visual cue typically overpowers the haptic cue. This fact 
could help solve simulation of the meeting of a virtual object with a hard 
immovable object. If the user is presented with a visual cue that the virtual 
effector has reached a hard surface, even though the haptic interface does not 
give the force of a hard stiff surface, but rather a linear Hooke's law approxi- 
mation, the user can still be fooled into thinking the virtual wall is rigid 
[Smi04]. Many synergistic multimodal interaction systems have been devised 
and studied that employed gestures [B0I8O], voice [Cor02], proprioception 
[Min97], speech/audio [Gra98; Min98], force feedback [Ric94; SalOl], and 
even smell, thermal [Din99], wind [Sen61], and biosignal feedback [Mit93]. 

On the other hand, multisensory interactions can also modify user per- 
ception, as illustrated by the famous McGurk effect. The McGurk effect is a 
perceptual phenomenon in which vision alters speech perception [McG76]. 
Simple visual tricks can easily alter the body image that is created by the 
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proprioceptive sense [Ram98]. In the study of Yang et al., for instance, it has 
been reported that the use of proprioception and tactility in addition to 
visual feedback make it possible to increase the effective field of view (or 
Geometric Field of View, GFOV 1 ) to make the scene more visible without 
introducing any negative effects (such as distorted size perception) [Yan04]. 
Burdea et al. reported that multimodal interaction involving stereoscopic 
display, audio, and haptic feedback increased task efficiency significantly 
[Ric96]. When active haptics is not feasible, passive haptics and the use of 
props (along with visual feedback) can produce very good effects. 

Although the best-known cross-modal effects are those of vision influen- 
cing other modalities, visual perception can also be altered by other modal- 
ities as well [Wal99]. In particular, the perception of distance is due to a 
combination of visual and motor input (muscles of the eyes, neck, and other 
body parts), and therefore proprioception plays a major role in spatial 
perception [BerOO]. We already pointed out that proprioception is also 
important in creating a body image and thus in promoting self-presence, 
one of the main goals of any virtual reality system. Slater et al. have reported 
that higher presence was achieved when one's own body was shown (through 
an HMD) as a way of matching the proprioceptive sense to that of the visual 
[Sla97]. Another work by Slater and his colleagues has shown that employ- 
ing a proprioceptive interface involving physical motion resulted in a higher 
user-felt presence, for instance, the "walk-in-place" metaphor for navigation 
versus simply pressing mouse buttons [Sla95]. 

Cases of Interaction/Interface Design 

In this section, we present several cases of interaction and interface design 
for various applications. Each case illustrates the domain customization and 
the process of trade-offs among different issues depending on the goal of the 
application. These interesting issues include multimodality, tangibility, gen- 
eric and commonly used interfaces, limited device capabilities, task analysis, 
human factors, presence, alphanumeric input, and so on. 

a. Case Study 1: Ship Simulator 

Figure 8.2 showed the task decomposition for the Ship Simulator used in the 
earlier part of this book. The functions of the Ship Simulator can be mainly 



1 The GFOV differs from the actual physical FOV in that it refers to the angle 
encompassing a given scene (in its original scale). For instance, a 100% GFOV 
coincides with that of the physical FOV, and 200% GFOV would allow one to see 
twice as much angularwise (or physical FOV, and 200% GFOV would allow one to 
see twice as much angularwise (or the scene is reduced in half angularwise). 
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Figure 8.7. A trainee user interacting with the Ship Simulator to control the steering 
handle and the engine lever. (Courtesy of J. Seo.) 



divided into two concurrently applicable functionalities: operating the ship 
(initiated by the trainee) and setting the training environment and situation 
(available to the trainer through separate control point). The ship operation, 
which is basically a navigation task, is further decomposed into three sub- 
tasks of ship speed control, direction control, and view control. The primi- 
tive tasks such as selection and manipulation, and certain metaphors (in this 
case, no metaphors are used) of selection and manipulation are needed to 
realize navigation control. As for the trainer, the task of setting up the 
training situation or environment is further refined into several subtasks, 
such as introducing (or removing) additional (existing) ships and environ- 
ment objects, changing ship engine parameters, and environment conditions 
(e.g., day or night, weather). We can judge that, for the trainee interface, 
implementing it with 3D multimodality will be useful for the transfer of 
training to the real ship maneuvering situation (see Figure 8.7), whereas for 
the trainer, a desktop interface using the keyboard and mouse would be 
sufficient. 

b. Case Study 2: Immersive Authoring [ LeeG04 ] 

Suppose we would like to construct an interactive content, such as the 
famous story of the hare and the tortoise, in an immersive manner. To 
realize this story as a VR-based content, several subtasks will be required. 
The objects must be modeled (i.e., geometric shape and configuration) 
according to the details required by their functions. For instance, the 
hare's running perhaps requires modeling of the legs and a rhythmic 
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motion/sound associated with the legs. The initial scenes must be put in 
place. Specific details need to be filled in for the object's behavior such as 
timing, motion profiles, conditions, triggering events, and the like. The 
director (i.e., user) should be able to insert special effects, sound tracks, 
and change the lighting condition into the behavioral time line. All of these 
important modeling subtasks may be repeated and rehearsed as the content 
develops and matures, during which time the director take notes, adjust 
parameters, try different versions, replay and review, and even act out the 
object's role. The director will also constantly require various types 
of information to make decisions and carry out these tasks. Many of 
these tasks can benefit from what 3D interaction. Table 8.1 shows a 
possible task hierarchy and interface proposal. One of the key interaction 
requirements will be to provide the feeling of "concreteness" so that less 
tech-savvy people, such as artists and producers, can quickly learn and use 
the authoring system. More detailed descriptions of possible choices for 

Table 8.1. Possible subtasks and interfaces for immersive authoring. 



Subtask Hierarchy Possible/Proposed Interface 



Objects (actors) 


Form 


Geometric modeling 


Direct Manipulation/2D 


specification 


specification 




GUI 






Object placement/rotation 


Direct manipulation/props 






Shape modification (scaling) 


Direct manipulation/props 






Attribute setting 


2D GUI/props 




Function 


Motion specification 


PBD/props 




specification 


Scripting/programming 


Keyboard (virtual/real) 






Adaptation 


N/A 




Behavior 


Scripting/programming 


Keyboard (virtual/real) 




specification/ 


Model-based specification 


PBD/2D GUI props 




user interaction 


Spatial constraints 








Events 








Actions 








Routes 








Synchronization of two objects 


Two-hand tracking 






Synchronization of multiple 


Mixture 






(> 2) object behaviors 








Role playing (object control) 


Mixture 






Deployment 


Direct manipulation in 3D 






Adaptation 


Mixture 


Scenewide 


Inserting 


Sound 


2D GUI 


operations 


effects 


Lighting/camera 


World in miniature [Sto95] 








Direct manipulation in 3D 




Reviewing 


Navigation 


2D GUI/button 


Information 


Note taking 




Keyboard 


gathering 


Information 


Behavior/timing 


2D Graphics/text 


and retrieval 


request 


Object-specific 


2D Graphics/text 






Scene-specific 


2D Graphics/text 


Version 


Save/replay 




2D GUI 


management 


Compositing 




2D GUI 
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metaphor and interface design are treated in the second part of the case 
studies. 

To develop an immersive authoring system with a reasonably compre- 
hensive set of functionalities and the associated interactions, the various 
interaction models and proposed interfaces must be consolidated into a 
manageable set with usability and device constraints in mind (see Figure 
8.8). Table 8.2 shows the consolidated interface design with three main 
interaction models: direct manipulation using a virtual hand (or through 
props with a real hand when the see-through option is used) for most 
important authoring tasks, conventional programming through alpha- 
numeric input using the real keyboard, and the use of 2D GUI within the 
3D space for other system controls. 

Thinking over that scenes of stories are usually similar to the real world 
environment, and because most of the participants will be na'ive users (e.g., 
children), a virtual hand is chosen for the interaction method. Two hands 
must be tracked to realize both the multiple object motion/behavior coord- 
ination and the keyboard-based alphanumeric input. In contrast with the 
execution environment, the authoring environment needs many more inter- 
action techniques and modes of modeling tasks. Therefore, a menu system is 
necessary to organize these various interaction techniques and modeling 
tasks. Under this requirement, an iconic menu is presented for the user at 
a fixed position relative to the view parameters. Using a 3D manipulation 
with a virtual hand for the menu system is the most consistent way with the 



Virtual 




Figure 8.8. A possible interface for immersive authoring. A user immersed in the 
virtual space can specify form, function, and behavior of virtual objects and execute 
and test them, by directly interacting with virtual objects in a concrete manner: (a) the 
user selecting a virtual fish with a virtual hand and scaling it using a 3D widget; 
(b) the user demonstrating a collision event by grabbing one virtual fish (blue) and 
moving it toward the other (yellow) [LeeG02]. (Reprinted with permission from the 
ACM © 2004.) 
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Table 8.2. Consolidated interface design for PiP 



Chosen interface Subtasks 



Devices needed 



Direct manipulation Object placement/rotation/scaling 
(virtual hand)/ Motion specification 

Props (real hand) Attribute setting/viewing 



Two trackers for two hands, 
camera, props, button device 



Deployment 

Navigation/camera control 
(world in miniature) 
Behavior coordination 
Model-based behavior specification 



Keyboard/button Scripting/programming 
Note taking 

Role playing (object control) 
2D GUI/menus Discrete attribute setting/viewing 

Version management 
System control/effects 



Two trackers for two hands, 
keyboard, button device 



Tracker, button device 



execution environment. Navigating through a lot of menu items using a 
virtual hand might bring more fatigue to the participants' arms. In addition, 
continuously moving their hands between a virtual object and the menu 
system is quite inefficient, whereas it would be more efficient if menu items 
are selected in another way and the participants' hands stayed with the 
virtual object being manipulated. Concerning this problem, an additional 
interface is introduced for the menu selection. Users hold a three-button 
prop on their nondominant hand and select menu items by pressing 
the buttons on it (e.g., left, right, select). The final set of interaction 
devices needed to realize these interactions/interfaces are two 6DOF motion 
trackers and two buttoned devices for the direct manipulation tasks and 2D 
GUI interaction. Figure 8.9 shows the VR devices used in this immersive 
authoring system: a head-mounted display, 6DOF tracker for tracking head 
and hand positions and orientations, and a 3-buttoned prop for menu 
selection. 

c. Case Study 3: Tabletop Computing 

Interaction modeling and interface design are not only applicable to VR 
systems. Figure 8.10 shows a platform for computing and interaction based 
on a tabletop environment (similar to a workbench style of display system). 
One can envision a particular set of applications (e.g., three or four family 
members playing a board game) and common interactions/interfaces re- 
quired in such an environment by the same task analysis process. Figure 
8.1 1 shows a simple task analysis for several possible suitable applications on 
the tabletop environment. The common primitive subtasks are identified to 
form a basis for developing a unified consistent form of interface. Note that 
a few different styles of interfaces may be employed for one task depending 
on the application. However, the number of different options should be 
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Figure 8.9. Interaction and display devices proposed for the task of immersive 
authoring: a head-mounted display, button device, tracker, and a glove (for simple 
gesture input) [LeeG02], (Reprinted with permission from the ACM © 2004.) 

minimized as much as possible to create a generic computing platform. As a 
form of generic computing environment, emphasis should be put on overall 
usability and consistency rather than on VR-related concepts as experience 
and presence. 

An important part of designing the appropriate interfaces for tabletop 
computing is the selection of the tracking technology. We might consider 
two possible choices that are wireless and device- or marker-free: using a 
touchscreen and using a camera-based hand (or fingertip) tracking system. 




People tracked and recognized 



Figure 8.10. A hypothetical tabletop computing environment. 
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Tabletop Application 


Subtasks 


Mapping 


Primitive Task 


Application 
Manager 


Icon Selection 




Selection 


Execute Program 




Icon Moving 


Work Space Designation 


Window Resizing 


Manipulation 


Menu Selection 




Board Game 


Select Piece 


Move Piece 


Draw Card 


Confirmation 




Remote Control 


Select Program 


Choose Channel 




Pass Control 





Figure 8.11. Extracting common primitive subtasks from several applications. 



The touchscreen system provides stable tracking whereas camera-based 
hand tracking can be rather unstable due to slight variations in skin colors, 
environment lighting, jitter due to the constantly changing shape of the hand 
blob, and so on. However, due to the relatively low height yet large area, 
touching the screen, especially for the far part of the table, can be ergonom- 
ically difficult. A simple application of Fitt's law is illustrated in Figure 8.12. 

With smaller icons or objects, the task time will generally increase, al- 
though it is possible to pack more items within the limited table area. In 
addition, in order for the hand-tracking camera to cover the whole tabletop 
space, it must be installed high above the table, and depending on the 
capability of the camera, the effective accuracy might prevent the icons or 
objects from being too small. Thus, a compromised solution might be to use 
both touchscreen and camera-based hand tracking for the space immediately 
near the user, and hand tracking only for space relatively far from the user. 
This way, for far objects, the user can select and manipulate objects through 
a cursor drawn by extending the line between the user's assumed head (or 
body position) and the tracked hand tip to the surface of table. The size of 
the objects or icons can be set at a medium size (<~5cm), and varied 
dynamically according to the distance from the user. Aside from tracking 
the hands, an additional interface such as voice or gesture recognition may 
be considered. For instance, a selection confirmation might be made based 
on time passage, that is, the cursor staying for an extended period of time 
above a particular interactable button might trigger the "push" action. Or, it 
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Figure 8.12. Using Fitt's law to compare the projected task time between when 
having to touch the table screen and when using camera-based hand tracking so that 
the user can interact above the table. The object size must change to maintain the 
theoretical bound on task performance time. 

might be more efficient to employ a simple voice command and recognition 
(e.g., upon selection, user says, "Push"). Ideally, the final choice of the 
interface should be determined after an extensive usability study comparing 
different aspects (task performance, preference, fatigue, etc.). As for the 
primitive tasks such as selection, manipulation, navigation, and system 
control, many possible interface designs have been proposed and studied 
extensively including their task analysis (although simple) and taxonomy 
[Bow05]. We introduce them in the next few sections. 

d. Case Study 5: Selection and Manipulation 

According to Bowman et al. [Bow05], selection is the task of acquiring or 
identifying a particular object from the entire set of available objects. The 
task of selection is one of the most important tasks in any interaction 
because it is needed in virtually any application. Manipulation is also a 
very common task and usually involves positioning and rotation in 3D 
space. Note that selection must usually precede manipulation. Thus, the 
tasks of selection and manipulation are discussed together. 

Figure 8.13 shows a classification of various selection techniques by task 
decomposition [Bow05]. The task of selection mainly involves these sub- 
tasks: (1) indication of the object to be selected, (2) confirmation of the 
selection, and (3) providing feedback. Each of these subtasks may be realized 
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Figure 8.13. The selection task decomposed into subtasks [Bow05]. (Reprinted by 
permission of Pearson Education, Inc., as Pearson Addison-Wesley © 2005.) 



by various interfaces. For instance, the indication of the target object may be 
done by a direct touch, by voice, laser point like picking (also called virtual 
raycasting) from hand (or by gaze), flashlight/spotlightlike picking (see 
Figures 8.14 through 8.16), and so on. Selection confirmation may be 
realized by a discrete event (such as a button press), simple gesture, voice 
command, passage of time, and so on. The feedback after a selection is made 
is also important in that the user is immediately notified as to the result of 
the action. It may be visual (e.g., highlighting the object), aural (e.g., simple 
beep), or even haptic/tactile. 

Figure 8.17 shows a similar task hierarchy for object manipulation 
[Pou98]. Manipulation is divided into the major tasks of (1) attaching the 
target object to a control medium for manipulation (which practically 
amounts to object selection); (2) actual positioning; and/or (3) actual orient- 




Figure 8.14. Virtual hand selecting an object by collision or direct touch. 
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Figure 8.15. Spotlight selection metaphor [Lia94], The closest object among the 
objects within the cone is selected. 

ing; and (4) producing feedback indicating whether the task has been carried 
out successfully. Similarly, many different interfaces are possible. For in- 
stance, the object may be attached to the control medium simply by direct 
selection with the virtual hand with which the manipulation would be carried 
out. Note that the medium for selection and manipulation need not be the 
same, although they usually are as in the previous example. The object may 
be attached to various types of control media such as the virtual hand, 
teleoperated cursor, widget, or virtual tool, gaze line (a line connecting the 
assumed position of the eyes and the target object), and so on. 

Once the object is selected and attached to the control medium, the actual 
positioning and orienting subtasks can be carried out and the fashion in 
which they are carried out depends on the way the object is selected. For 
instance, an object selected with a virtual hand and attached to a virtual 
hand can be manipulated, again by direct hand movement. The mapping 
between the actual hand movement and the virtual object movement may be 




Figure 8.16. Occlusion-based selection for augmented reality applications. When 
a marker is occluded for a predetermined period of time, the corresponding object 
is selected and rendered in the view. (Courtesy of Gun Lee.) 
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Figure 8.17. Manipulation decomposed into subtasks and possible styles of inter- 
faces for them [Pou98]. 

adjusted and controlled. If an object was selected with a virtual ray, then the 
object can be moved or rotated according to the movement of the hand that 
controls the direction and orientation of the ray. Manipulating objects from 
a far distance using a virtual ray tends to be difficult. One way to alleviate 
this problem is to select the objects far from the user and "bring" them 
within close range so that the object can be manipulated directly. Then the 
manipulated object is put back into the original place. However, the task of 
putting back the object can be nontrivial. The method called the scaled 
world grab scales down the entire virtual environment around the user's 
viewpoint to a convenient size so the user can manipulate the object once it 
is selected [Min97]. This has the same effect of bringing the (far) object 
to the user for manipulation without the worrying about how to put it 
back. That is, after manipulation, the world is scaled back to its original 
size, thereby taking the object back to its intended position and orientation 
(see Figure 8.18). 

Another way to overcome this problem, called the Go-Go/Homer tech- 
nique, was proposed by Poupyrev et al. [Pou96]. The Go-Go technique 
combines the intuitiveness of direct selection by virtual hand for objects 
beyond the user's reach (see Figure 8.19). For objects that are far away, 
the user can extend the virtual arm or hand (like the Go-Go gadget) to reach 
the target object, and the rate of the growth is exponential with respect to the 
distance of the target object (it grows faster as the hand extends and reaches 
the object). Once the object is selected this way, it can be manipulated as if it 
were grabbed by the hand. To allow the user to position virtual objects 
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Figure 8.18. Scaled world grab. The scene is scaled down and in the process brought 
into the space reachable by the user. The user manipulates the objects in his personal 
space and scales it back to normal size [Min97]. (Reprinted with permission from 
ACM © 2004.) 



within a large manipulation range, the technique linearly scales the user 
reaching distance (or user motion) within the user-centered coordinate sys- 
tem. Although operationally it is similar to a virtual raycast and remote 
manipulation, and by the mere fact that it "seems" as if the object is actually 
grabbed by the hand, it preserves the intuitiveness offered by the original 
direct manipulation with the virtual hand. 

In the approach called the World-in-Miniature by Stoakley [Sto95], a 3D 
mini-map of the entire world is given to the user for selection, manipulation, 
and even navigation purposes (See Figure 8.20). The mini-map represents 
the actual virtual world that the user is immersed in, and thus by selecting 
and manipulating the mini-objects in the 3D mini-replica, actual corre- 
sponding objects can be selected and manipulated. As one of the objects in 




Figure 8.19. The Go-Go technique for selection. The function maps the user move- 
ment to the extension of the virtual arm [Pou96], (Reprinted with permission from 
ACM © 2004.) 
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the 3D mini-replica includes the user, selecting and manipulating it amounts 
to navigation (changing the position and orientation of the user). 

In order to move and orient an object in 3D space, six degrees of freedom 
are needed. However, allowing all six degrees of freedom can hamper exact 
positioning and orienting. Thus, it can be useful to limit the degrees of 
freedom, or consider only few degrees of freedom at a time using constrained 
movement or rotation. 

e. Case Study 5: Navigation 

Navigation, as already mentioned, is also a very commonly required task in 
any virtual environment. Figure 8.21 shows the major subtasks for naviga- 
tion and possible interfaces for them. The first subtask is to select the 
direction or the target to which the user wants to move. Then the second 
optional task may be to select the amount of the velocity (or even acceler- 
ation). The third task (which could be combined with the first task) is the 
actual command to move toward the selected direction. This command will 
vary in input conditions, for example, whether a long move requires a 
repeated application of small moves, or just commands for start and stop, 
or other more sophisticated control (with acceleration and deceleration 
control). 

For instance, the direction of travel may be determined by the gaze 
direction, 3D pointing, discrete (e.g., arrow keys) or continuous (e.g., steer- 
ing wheel) event devices, voice commands, body gestures (e.g., leaning to the 
right or left), and even through hap tic interaction. Note that although gaze- 
directed travel can be less tiring and efficient, it restricts the users from 
viewing the environment not in the direction of the movement. The styles 
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Figure 8.21. Navigation task hierarchy and possible interface styles [Bow99]. 
(Reprinted with permission from MIT Press © 2004.) 



of interfaces for these subtasks are often designed after a selected metaphor. 
Examples include driving a car, flying an airplane, flying a helicopter, 
walking, navigating a spaceship, and the like. Many metaphors are real-life 
inspired, but "magical" navigations are certainly possible such as in the 
World-in-Miniature approach. 

The task of navigation can occur in several different contexts, whether it is 
for an open exploration (just looking around without a specific destination), 
exploration following a path, searching without a path, training to learn the 
spatial layout, and so on. Aside from designing the most appropriate inter- 
face for the navigation task itself, it can be helpful to use landmarks, signs, 
and maps to achieve higher task efficiency (e.g., help the user find something 
faster). Without such aids, virtual navigation often results in users getting 
lost and becoming disoriented. 



/ Case Study 6: Menu Selection and Invocation 
(System Control) [KimNOO] 

Another important and commonly required interaction task in any applica- 
tion is system control. System control refers to the way of making a com- 
mand to the system to perform a particular function that is usually not 
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3D-related. There may be a sizable number of functions available and often 
they are organized in a hierarchical manner. A menu system is quite suitable 
to serve as a means for system control, whether driven by a mouse and 
keyboard, by voice, or even by 3D interaction. The menu system is also the 
most familiar 2D computer interface that we know of, and it would be 
beneficial for computer users to have it extended in the immersive 3D 
environments. In this section, we analyze some of the ways a menu system 
can be realized in the context of providing system control for 3D virtual 
worlds. 

In 3D environments, unlike in 2D, we must first carefully consider where 
to locate the menu system within the world, which in turn, will determine the 
user's viewing direction to the menu (see Figure 8.22). There are some simple 
possibilities we might consider. 

1. World Fixed (WF). The menu system resides at a fixed location in a 
"strategic" world location. 

2. View Fixed (VF). The menu system is attached at and viewed from a fixed 
offset from the user (thus it moves with the head-tracked user). 

WF allows a relatively comprehensive display of the overall menu structure 
and menu selection history (because it is located at a strategic location away 
from where the task is being carried out), whereas with VF, a more compact 
menu display must be used so as not to block the task area. This is especially 
true in an immersive environment where head-mounted displays are used, as 
most HMDs suffer from low resolution and narrow fields of view. 

Aside from location, the following are variants of menu display methods 
(see Figure 8.23). 
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Figure 8.22. Menu style by location [KimNOO]. (Reprinted with permission from 
IEEE © 2004.) 
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Figure 8.23. Variants of VE menu display by items [KimNOO]. (Reprinted with 
permission from IEEE © 2004.) 



1. Pull-down (PD). The usual pull-down menu that displays the highest level 
menu items, and show its branches only during the selection task. 

2. Pop-up (PU). The usual pop-up menu that disappears once the selection is 
made. The menu structure associated with the particular menu selection 
path is shown only on the user's invocation. 

3. Stack Menu. A menu system that persistently displays the selection path 
either at the top portion of the popup menu (disappears once the selection 
is done), thus called the Fixed Stack (FS), or at a separate location (e.g., at 
the corner of the screen), thus called the Basket. 

4. Stack (BS). Only the menu options selectable at a given level are shown. 

5. Oblique/Layered (OL). This is a flat menu presentation displayed in an 
oblique fashion, or its structure organized and displayed by layers. 

Here, we consider some possible menu designs, which are listed below and 
illustrated in Figure 8.24. 

1. WF-PD: World Fixed, Pull-down 

2. WF-PU: World Fixed, Popup 

3. VF-FS: View Fixed, Fixed Stack 

4. VF-BS: View Fixed, Basket Stack 

5. VF-OL: View Fixed, Oblique/Layered 

Given the various ways the visual aspect of the menu system can be 
designed in a virtual environment for system control, the next design issue 
is to decide upon the way the item is selected and confirmed. Table 8.3 can be 
drawn classifying various combinations among methods for positioning and 
making the final confirmation command. For positioning, tracking, gesture, 



146 8. 3D Multimodal Interaction Design 




Figure 8.24. The five possible menu designs [KimNOO]. (Reprinted with permission 
from IEEE © 2004.) 



and voice were considered, and for making a command, button input, 
gesture, and voice were considered to signal the final yes or no decision. 
Tracking simply tracks to user-hand or a metaphorical object to designate a 
desired menu item and is considered a continuous event-driven modality. 
However, voice and gesture recognition, which allows users to directly speak 
of the menu item (e.g., start, enter, escape) or make positioning or com- 
mands (e.g., next, previous) has a discrete event-driven modality. The "X" 
marks in the table show the infeasible combinations for the selection task. 
For instance, tracking and voice combination under the column of zero hand 
is simply impossible (i.e., one hand used). The enumeration of various ways 
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Table 8.3. Combinations of Input Methods Possible by Modalities (3D Tracking, 
Gesture, and Voice) and Number of Hands Used. 
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of designing a menu system and modality choices illustrates the enormity of 
the design space with 3D UI design. 

g. Case Study 7: Whole Body Interaction 

Training has been considered as one of the most natural application areas of 
virtual reality. Here, we describe the interface for a VR-based motion/dance 
training system called "CloneMotion" [YanOl]. CloneMotion is an excellent 
example of naturally employing a whole body interface for a more complete 
VR experience. The usual way of learning dance is to observe the teacher, 
follow, and correct the motion. An ideal VR-based dance training might 
require accurate sensing for many of the limbs of the body (e.g., full-body 
motion capture) and an on- or offline evaluation module. As a full- 
body motion capture is impractical for an entertainment system, a low- 
cost marker-based tracker based on infrared cameras was developed for 
approximately tracking the important body positions (wrists, ankles, and 
belly) of the trainee wearing five reflective markers at the respective positions 
(see Figure 8.25). 

The evaluation module compares the original motion data to that of the 
trainees (obtained by the motion tracking module) frame by frame. At pre- 
designated "key posture" frames, a scoring system provides an indication to 
the user of how well one followed that particular posture using graphic 
special effects (e.g., explosion at the coincided body positions, and textual 
remarks such as "Excellent," "OK," etc.). The frame-by-frame scores are 
tabulated, summed with appropriate weights, and averaged for a final score. 

Thus, the user is to follow a character dancing with pre-captured motion 
data, and the tracked motion data of the user is compared to the original for 
evaluation. In order to help the user follow the dancing character better and 
be able to correct himself in real-time (rather than after the dance sequence is 
over), a concept of a sliding ghost was used that shows the discrete freeze- 
frames of the next imminent postures along with the continuous dance 
motion. As the trainee tries to follow the character on the screen, one is 
given a feel for how well one is following with the evaluations and corre- 
sponding special effects at the key posture frames, and a final score at the 
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Figure 8.25. A user trying to follow the dancing character. The dancing character is 
shown with successive freeze-frames of the postures so that the user can see and 
expect what kind of motion to follow moment by moment. 

end. The interface development process for CloneMotion well illustrates 
how the trade-off among cost, usability, whole body experience, and evalu- 
ation accuracy is made. 

h. Case Study 8: Tangible Interface for Product Evaluation 

Recently, "immersive" virtual reality systems have been proposed as a more 
effective platform for effective analysis of an evolving design because of, 
among other things, the natural style of interaction they offer when exam- 
ining the product, such as the use of direct and proprioceptive interaction, 
head-tracking and first-person viewpoint, and multimodality, compared to a 
desktop graphic rendering. 

It goes without saying that the foremost requirement of any effective 
analysis virtual system would be to provide sufficient visual realism, espe- 
cially for the product itself (vs. the scene that is included). Another related 
requirement is to have the virtual product match the real one in terms of the 
size. This is especially important for small-sized products such as the mobile 
phone. The third probable requirement for effective evaluation of small- 
sized handheld products is direct interaction. A related requirement to direct 
interaction is the provision of tactile/haptic modality in addition to the usual 
and relatively easy to provide visual and aural interaction. 

The popular ground-based systems, such as the Phantom 2 manipulators, 
can only simulate forces and texture surfaces at point contact. The exoskel- 
eton glove device such as the CyberGrasp 3 system is very expensive and 



2 Phantom is a registered trademark of Sensable Technologies Inc. 

3 CyberGrasp is a registered trademark of Immersion Corporation. 
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Figure 8.26. To track the prop and segment the image of the hand/fingers holding 
the prop. In order to track the prop using computer vision, 10 yellow markers are put 
on the blue-colored rectangular prop [LeeSY04]. (Reprinted with permission from 
IEEE © 2004.) 



inconvenient to use. An alternative approach is to use a prop. A prop is an 
interaction device that represents the virtual object (to be interacted with), 
and whose shape and/or appearance matches that of the actual physical 
object. Props can be spatially registered with virtual objects providing inex- 
pensive physical feedback to the user. 

Props allow us to add inexpensive physical and tactile feedback, signifi- 
cantly increasing presence for immersive environments and establishing a 
common frame of reference between the device and desktop 3D user inter- 
faces. The introduction of tactile augmentation allows us to explicitly con- 
trol the realism of virtual environments. The disadvantages of props are that 
each prop only represents one object. In light of this, designing a prop (or 
interaction device) that looks exactly like the actual phone (to be tested) is 
not only restrictive in its applicability, but also defeats the very purpose of 
using virtual products (that is, we would like to eliminate the need of 
building physical mockups or prototypes as much as possible). Instead it 
would be possible to design a "reconfigurable" prop that represents a family 
of products. For instance, as for mobile phones, the designed prop is just a 
flat rectangular box (as most mobile phones are roughly rectangular) with 
push-button switches on it. Figure 8.27 shows a prop with two button 
switches for a few simple functions. The tracked prop appears in the virtual 
space as various mobile phones. 

The use of a "representative" prop (i.e., one that only represents the actual 
object, but not the actual object) necessitates the use of a head-mounted 
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Fold/Unfold Power on/off 

Figure 8.27. The rectangular prop representing the mobile phone. The prop has 
10 yellow markers for vision-based registration purpose and two wireless switches 
for interaction [LeeSY04]. (Reprinted with permission from IEEE © 2004.) 



display (rather than a desktop or projection display) so that the virtual product 
(or mobile phone) can be rendered at the position of prop in the virtual space 
without distraction. This is more inconvenient for the user and makes the 
overall system more expensive. 

A problem with using the HMD is that, in the virtual space, the users have to 
interact with the target product using virtual hands, and this is expected to drop 
the feeling of directness and realism. A VR interface is designed that can show 
the real hands, segmented out from the real scene captured by the camera, and 
drawn by using a computer-vision technique (see Figures 8.26 to 8.28). 

i. Case Study 9: Alphanumeric Input 

Even in virtual environments, there are occasions when alphanumeric input 
is required. The most typical and simplest method of enabling alphanumeric 
input in a VR setting is through the virtual terminal using a virtual key- 
board. A virtual terminal refers to a virtual object, functioning as computer 
terminals do in the real world. Virtual terminals may have similar appear- 
ances to real-world terminals (i.e., having a display device, keyboard, mouse, 
and other items). Users may interact with virtual terminals in the same way 
in which they use them for programming in the real world. It could be 
represented with just a single flat virtual panel showing texts and 2D dia- 
grams. Users may interact with it by handwriting, gestures, voice recogni- 
tion, or any other methods to give input to the virtual terminal. Figure 8.29 
shows the virtual keyboard registered with the real hand, and a note-taking 
application overlaid on the virtual environment. 
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Figure 8.28. What is seen by the user through the HMD, while holding the prop to 
interact with the virtual mobile phone. Note the seam between the image-captured 
hand and the virtual cell phone [LeeSY04], (Reprinted with permission from IEEE © 
2004.) 

Virtual keyboards are often used for information kiosks, PDAs with 
touchscreens, and even when the physical keyboard is present to expand 
the range of input characters. For VR systems, the approach to alpha- 
numeric input can be categorized by whether they are mobile and wearable 
and whether they provide tactility and/or haptics. The mobile/ wearable 
keyboards are compact and reduced in size (for wearability), thus they 
usually have different key layouts from conventional keyboards. For in- 
stance, Matias' Halfkeyboard [Mat04] is a wearable keyboard worn on the 
forearm with only 22 keys sized at about 146 x 80 x 18 mm and 125 g. The 
Visual Panel [ZhaOl] overlays the virtual key layout on a tracked tablet, held 
by one hand, on which users can use fingers from the other hand to select 
letters in the virtual space. The Finger Joint Gesture Wearable Keypad 
[Gol99] uses 12 button switches (laid out as the telephone keypad) attached 
on the finger joints and the typing is performed by pressing on them with the 




Figure 8.29. The virtual keyboard registered with the real hand in a virtual 
environment. 
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thumb using the Thumbcode. Although these devices provide the tactility 
and/or haptics (which enhances typing performance as indicated earlier), 
they are difficult to use (e.g., device wearing) or learn (e.g., type with one 
hand only), and not appropriate for a large amount of text input. 

The Fingering and Acceleration Glove [Fuk97] and Vtype [Eva] are 
special devices for tracking the movements of the user's fingers. The fingers 
are tracked and tested for collision with the virtual keyboard. These ap- 
proaches use the conventional keyboard layout (that users are most familiar 
with), however, they suffer from device usability and lack of haptics/tactility. 
The importance of leveraging the tactility and conventional typing skill for 
the successful deployment of an alphanumeric input method is well illus- 
trated in the approach by VKB [VKB04] who recently unveiled the Projec- 
tion keyboard (even though not designed for a VR setting). The Projection 
keyboard uses an optical device to project a "conventional" keyboard layout 
on any flat surface and a separate vision-based module track and interprets 
the finger "tapping" movements to enable near-natural typing. 

Another natural method for alphanumeric input is to use voice/speech 
recognition, or combine it with some of the methods described above. The 
pioneering work of the "Put that there" system by Bolt et al. demonstrated 
the usefulness of such multimodal interaction [B0I8O]. However, voice rec- 
ognition is still not practical enough for something other than recognition of 
a small set of keywords. 

Summary 

Designing an easy to use, natural, and efficient interface is a difficult prob- 
lem. In most cases, all the design goals cannot be met. Interface design must 
be preceded with a task analysis producing an interaction model. Interfaces 
are proposed and tried based on the interaction model. Employing a multi- 
modal interface is an effective way to produce a natural (with high user-felt 
presence) and at the same time efficient interface. However, it is vital to 
carefully maintain the multimodal consistency. 

Pondering Points 

• Imagine an environment where the user must select various objects far or 
close. Is it better to employ one way of selecting the object far or close, or 
employ two different modes, one for selecting a close object (e.g., direct 
touch) and one for the far (e.g., virtual ray)? Explain your answer. 

• Can a person, in general, listen to music and solve a math problem at the 
same time? How about reading a book while listening to music? What 
does this tell us about applying multimodality in interaction design? 



Chapter 9 

Simulation I: Collision Detection 



So far we have covered several important topics for virtual reality. We first 
covered the systematic process for designing the basic structure of a virtual 
reality system. Then, we considered the need and problem of designing 3D 
multimodal interaction for naturalness, efficiency, and presence. It goes 
without saying that using realistic-looking geometric models and objects 
contributes positively to increasing the sense of presence, and the same 
goes for dynamics and behaviors of objects. Objects exhibit many different 
behaviors, and we cannot possibly cover all of them as to how to formulate 
and implement them. However, we discuss three important classes of object 
simulation. They are collision detection, physical-based motion, and hand- 
ling collision response (which uses the former two). 

Handling Collision 

There are largely three things to consider in handling collision: detection of 
collision, determining the location of collision (let's call this collision deter- 
mination for the lack of better shorthand), and generating a response to a 
collision. The accuracy and methods as to finding the location of the 
collision on the respective objects depends on what kind of scheme is used 
to detect the collision in the first place (e.g., it may be possible to merely 
detect the incident of collision (fast), but not be able to determine the exact 
location of collision). Once we know there was a collision, and the locations 
and directions of the collision on the involved objects are known, a response 
must be simulated. The way the responses are calculated and simulated can 
be based on a complex physical model or on a simple heuristic formula. We 
defer the explanation of the response computation in later part of chapter 10 
after introducing the simulation of physics-based motion. In this chapter, we 
first look at different ways to detect collision, and in some cases, determine 
collision. 

As we already know, geometric models used in 3D graphics or VR systems 
are made of polygons. The most naive way, thus, to detect and determine 
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The Simulation Loop 

1 . Read any external input. 

2. Compute simulation and update the objects and scene graph. 
2. 1 Check for any collision and generate response. 

3. Update thecamera position/orientation. 

4. Redraw thescene. 

5. Go back to 1 . 

Figure 9.1. The overall computation model for VR program. 

collision, is to perform a pairwise "piercing" test between all the polygons in 
the scene (see Figure 9.2). In the simpler case, the polygons might be all 
triangles. If there are n moving objects and m nonmoving static objects in the 
scene, we are looking at (n * m + n Ci) object combinations (that can pos- 
sibly be in collision), and for each object combination, with objects with 
maximum k polygons, there are k 2 polygon pairs to check for collision (or 
piercing each other). In total, there are (n*m + n Ci) * k 2 polygon piercing 
tests. The collision testing and response generation is to be done at every 
tick in the simulation loop (or at the minimum refresh rate of 1 5 Hz) upon 
small amounts updated in the scene (see Figure 9.1). This is, even for a 
moderately populated virtual scene, computationally heavy to handle in 
real-time (in a fraction of the one simulation tick). Also note that for an 
object that moves so fast it passes through another object during one 
simulation tick, the collision would not be detected. Still, the polygon-level 
piercing test is a good enough approximation of the location of the collision 
(i.e., we only know which triangles pierce each other, but not exactly where 
within the triangle; see Figure 9.2). To further find the exact collision 
locations "on" the respective polygon surfaces, further calculations would 
be required. 

Thus, the challenge is to reduce the amount of computation as much as 
possible yet provide the appropriate and right amount of information about 
the collisions for a given application. One way to accomplish this is to filter 
out the unreasonable pairwise (objects and/or polygons) testing possibilities 
(reduce the numbers in the first part of complexity equation, n and m). This 
requires knowledge or assumptions about the environment, for instance, 
knowing that certain objects will never collide with each other, or knowing 
in advance the region of interest of the user (i.e., do not care about correct 
collision simulation in other parts of the scene). The other way is to reduce 
the number of polygons to test for by reducing the numbers in the second 
part of the complexity equation k 2 by using bounding volumes. A bounding 
volume of an object is a simplified geometric object representing (and con- 
taining) the original with much fewer polygons. Naturally, when only 
bounding volumes are used, the exact location of the collision cannot be 
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Figure 9.2. Polygon piercing test between two objects made of polygons/triangles. 

determined. The overall process can further be made efficient by using a 
hierarchical object structure. We can intelligently search and test for parts of 
the two objects most likely to collide first. This reduces the overall average 
computational complexity dramatically. 

As we explain the different approaches to collision detection and deter- 
mination, we go over the necessary basic geometric calculation required to 
realize these approaches. 

Collision Detection with Line Segment(s) 

In certain cases, for practical purposes, it suffices to approximate an object 
with short line segment(s). For instance, a car may be, for all practical 
purposes, augmented with two short line segments emanating from the 
bottom of its wheels in the direction perpendicular to the horizontal plane 
of the body (see Figure 9.3). The collision checking between these segments 
and the terrain polygons can be used to make sure and render the car right 
above the ground (and prevent it from hovering over or penetrating under- 
neath an irregularly shaped terrain). 

Line segments can be used on the virtual hands or fingers for object 
selection. The virtual hand might look like a hand with its index finger 
pointed out, where a "line" of contact is defined (usually, only one sensor 
is used to track some representative position of the hand such as the 
index fingertip). Here we can put a virtual line segment on (or along) 
the index finger and test collision between the segment and the target object. 
The objects would be selected or object interaction can be initiated by 
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Figure 9.3. Collision detection with objects approximated with line segments: (a) car 
on a terrain; (b) virtual hand. 



approximately detecting the collision between the hand (or actually the line) 
and the target object. 

Line Segment (Ray) Versus Triangle [M0088] 

Suppose that, in the above example, we want to check if the line segments 
from the car are in collision with the subset of polygons from the terrain. 
Further suppose the polygons are all triangles. The lowest-level test proced- 
ure we need is to check if the line segment penetrates a triangle, or more 
precisely, to compute the distance of the point toward the triangle (i.e. 
penetration distance) in the direction of the ray. A point on a triangle can 
be represented by the following formula, where vO, vl, v2 are the three points 
of the triangle and u, v are the barycentric 1 coordinates of the point (with 
respect to the given triangle). 

A point on triangle = (1 — u — v) * vO + u * vl + v * v2 

A line segment (or interchangeably a ray) is represented by an equation: 
o + td, where o is the origin of the ray, t is a real number, and d is the unit 



1 Barycentric coordinates are coordinates formed by expressing a point in (n — 1)- 
dimensional space with respect to n designated points that forms the (n — 1) dimen- 
sional space. For instance, two designated points define a line (one-dimensional 
space) and by a weighted sum of these two designated points, a point on the line 
can be specified, and similarly for three designated points on two-dimensional 
triangle, and four points on a three-dimensional tetrahedron. The weights must 
sum to one to represent a point within the respective space (line, triangle, tetrahe- 
dron, etc.). 
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direction vector of the line. The range of t defines the bounds of the line 
segment (t > 0, and less than length of the segment). We equate the two 
equations (point on triangle and the ray). 



After rearranging the terms, we get a set of linear equations and can solve 
for the values u, v, and t. By formulation, t must be greater than 0, and u + v 
is less than or equal to 1 to be specifying a point within (on) the triangle 
(otherwise, the ray does not cross the interior of the triangle). Once t is 
known, the distance between the origin of the ray and the point on the 
triangle where the ray would hit can be computed easily and a threshold 
value can be used to invoke a collision event (see Figure 9.4). 

Ray Versus Polygon [Mol02] 

Suppose the polygons were not all triangles. Then, we need a more general 
formulation. Given a ray and a polygon, we first compute if the ray passes 
through the infinite plane formed by the polygon (infinite plane that coin- 
cides with the polygon surface). Let's denote n p as the normal of the infinite 
plane (this can be computed from the vertices of the polygon). Then, the 
plane equation can be set up as 



where x is a point on the plane and D is the perpendicular distance from the 
origin to the plane (see Figure 9.5). We substitute the line equation into x 
and solve for a valid t (t that is within the range): 



o + t * d — (1 — u— v) * vO + u * vl + v* v2 



n p 'X = D 



n p '(o + t* d) = D 



Vo 



triangle 




O : Origin of ray 



Figure 9.4. A ray and a triangle. 
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Plane P on which 
polygon lies 



Figure 9.5. A ray and a polygon. 

where o is the origin of the ray and d is the unit direction vector of the ray. If 
| «p • d\ is less than a pre-set threshold (near zero), then the ray is (or nearly is) 
parallel to the polygon and does not cross the polygon. Otherwise, we need 
to find the value of t and use t to find the point p (the point the ray would hit 
on the infinite plane), and see if p lies within the polygon. 

In order to do this, we project all the vertices onto the two-dimensional 
plane (xy or yz plane, whichever maximizes the area) by simply dropping the 
z-or x-coordinate value. To determine if p is within the polygon, we apply 
the Jordan curve theorem that states, informally, that when we shoot a ray 
from p to the positive x-direction and if we count the number of crossings 
across the lines formed by the polygon and it comes out to be an odd 
number, that means the point lies within the polygon (see Figure 9.6). 

To apply the Jordan curve theorem, we translate the point p into a new 
coordinate system where p serves as the origin. Then, we perform a line 
versus ray (originating from p and going in the positive x-direction) inter- 
section test for each edge of the polygon (note that vertices are likewise 
translated accordingly). The algorithm can be made simpler by ruling out 
some simple cases, as shown in Figure 9.7. 

An intersection test between an infinite ray and an edge (last part of 
algorithm) can be done simply by equating the two line equations. 

o\ +s * d\ = o2 + t * dl 
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polygon 

Number of crossing : 2(even) 
-> outside 



Number of crossing : 3(odd) 
-> inside 



Point p 



Figure 9.6. The Jordan curve theorem for checking insideness. 

One vertex on the segment is ol, and the other is ol + s*d\ with a known s. 
dl is in the +x-direction and o2 is the origin where p has translated, t, greater 
than zero, can be found easily. 

To summarize, the ray versus triangle (or polygon test) would have to be 
applied in a pairwise fashion to all the potentially intersecting pairs. This is 
still a lot of computation even though slightly less than pairwise polygon- 
polygon testing. Reducing one object down to a couple of rays is an efficient 



e 3 




e 2 


^ polygon 

Number of crossing : 1 (odd) 
-> inside 

e 6 











Point p e^~^e s 



c = 0 /* c is the number of crossing 7 
For each edge, 

I* check the y coordinates of vertices 7 

If same sign, Then cannot cross I* edges el, e2, e4, e5 in the figure*/ 
Else 

I* check the x coordinates of vertices 7 
If both positive Then c = c+ 1 I* edge e6 in the figure 7 
Else if both negative Then cannot cross I* edge e3 in the figure *l 
Else I* x coordinates differ in sign *l 

Compute infinite ray ( x axis) vs. edge intersection and update c value 



Figure 9.7. A fast algorithm for implementing the Jordan curve theorem. 
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method to go, however, another avenue for improvement would be to reduce 
the set of potentially intersecting pairs. Although possible, simplifying both 
objects in the potentially intersecting pairs to rays is rare. Rather, there 
usually is a key active object reduced as a ray (such as the moving finger in 
the object selection example) for which intersection with other objects (or its 
bounding volume) is tested. 

Polygonal Objects Versus Polygonal Objects 
(Collision Detection Only) [Can86] 

It is not possible to simplify an object as a ray all the time, for various 
reasons, for instance, for a more accurate collision detection and determin- 
ation (and thus to produce a more realistic response also). Given two 
potentially moving convex polygonal models A and B, Canny et al. 
[Can86] observed there can be mainly these cases of collision-related events 
that can occur between A and B (also see Figure 9.8). 

1 . Some vertices of B are penetrating faces of A. 

2. Some vertices of A are penetrating faces of B. 

3. The edges of A and B meet at a point. 



A 




A 



(c) 

Figure 9.8. Three major cases of collision events between two convex polygonal 
objects. (Adapted from [Can86] with permission from IEEE © 2004.) 
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For Cases 1 and 2, we need to check if any vertices of B (or A) are 
penetrating the faces of A (or B). If the purpose is just to check for the 
penetration, the simple piercing test can be used as follows. Given a vertex, 
we can decide if the vertex v is on one side (inside) or on the other (outside) 
by the following test. 



rip'v = d (n p is the normal vector of polygon) 
If d = 0, then v is on the surface of the polygon 
If d < 0, then v is inside the surface of the polygon 
(interior of the polygonal model) 
If d > 0, then v is outside the surface of the polygon 
(exterior of the polygonal model) 



The piercing test can be applied for one vertex against all polygons of the 
other object, and if any one vertex is on the other side of all polygons of 
the other object, then the vertex has penetrated the other object and the 
collision is detected. 

If the test does not find that the objects are either in Case 1 or 2, then the 
final test for the third case is checked by an equation: 



where e a is the edge of A, is the edge of B, and p a and pi, are any positions 
on e a and ej. The equation checks that the edges e a and are coplanar (thus 
meet at a point). 

Although the above three cases cover most of the relative spatial config- 
uration of two objects at a given instant of time, note that the list is not 
complete. For instance, the rare cases illustrated in Figure 9.9 is not detected 
by the three cases above. 



Figure 9.9. Interfering configurations not covered by the three conditions in Figure 
9.8 (illustrated in 2D for simplicity): (a) none of the vertices of A or B penetrates 
another, yet they are in collision; (b) the condition of collision put forth by Canny 
does not work for concave objects. 



(Pa ~Pb) ' (e a x e b ) 




(a) 



(b) 
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Triangularized Objects Versus Triangularized Objects: 
The Interval Overlap Method [Mol02] 

For two "triangularized" objects, we can certainly apply the above polyg- 
onal model versus polygonal model approach. Or, we can test if triangles of 
two objects are piercing each other pair by pair, instead of Canny's method 
that pairs polygons/triangles and vertices, testing if any vertex of one trian- 
gularized polyhedron is on the other side of all the other "closed" triangu- 
larized polyhedra (we later compare the algorithmic complexities between 
the two methods). Now, we must devise a simple method of testing (or 
determining) collision between two triangles (triangle vs. triangle piercing 
test). Suppose that triangle t\ is on infinite plane tt\, and t2 on 7r 2 (see Figure 
9.10). 

7i\ '. n\'X+ dl = 0 
iT2: «2«x + <i2 = 0 

where n\,n2 are obtained easily from vertices of t\ and t2. 

We try to compute the signed distance from vertices of t\ to 77-2. Suppose 
that vertices of t\ are denoted as u t (where i = 1, 1,2). 

«2 • Ui + d2 = d U i (i = 0, 1,2) 

Thus, d ui denotes the signed distance from each vertex of tl to tt 2 . (Note that 
this is the same as Canny's "other side test" (or vertex piercing) using the dot 
product.) If all three signed distances are not equal to zero and have the same 
signs, then there is no overlap (the triangle lies on the other side of the other). 
If they are all zero, then the triangle lies on the same plane, and their overlap 
can be tested using the segment versus segment test and point containment 
test (see previous sections). So far, this is mostly equivalent to Canny's 




Figure 9.10. Testing for overlap of two triangles on the same plane. 
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Figure 9.11. Two triangles may be in collision or not. Two infinite planes meet at the 
line L [Mol02], (Reprinted with permission from A. K. Peters © 2004.) 

method. When the signs are different among the three signed distances, the 
triangles may be piercing or may not be (see Figure 9.11). 

If the signs are different among the three signed distances, then tt\ and ir 2 
intersect at a line (see Figure 9.11). Let's call that line / = o + t * d where d is 
the direction vector easily obtained by n\ x n2. We compute the portion of 
each triangle that overlaps on this line, and carry out an interval test to see if 
those portions overlap. If these portions do overlap, then the two triangles 
pierce through each other. The portion of one triangle that overlaps on the 
line / is illustrated in Figure 9.12 as a segment between t\ and t2. In order to 
obtain the values tl and ?2, we use the Similar Triangle theorem and 
compare the distance rf„o and d u \ to p^tl and t\p u \. And, p u o is the point 
uq projected on the line /, and so similarly is p u \ . Also, p u o is obtained by a 
simple dot product between d and (uq — 0), and similarly for p u \ . The similar 
triangle holds between triangle Au 0 bk 0 and bM\bk\, thus the following equa- 
tion holds. 




Figure 9.12. Computing the interval the triangle overlaps on line 1 (the geometric 
situation) [Mol02]. (Reprinted with permission from A. K. Peters © 2004.) 
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(Pui - -Puo) = duY- -d u o 

tl d u \ — Puo d u \ — d u0 tl = —p u \ d u0 
tl = (Pm d ul - Pu\ d M )/{d ul - d M ) 

We can get tl in a similar way, and get the other ts for the other triangle (let's 
call them t3 and t4). If there is an overlap segment formed by tl and tl, and 
by t3 and t4, the two triangles overlap. 

An exact comparison of this method to that of Canny is difficult, because 
of the different assumptions made. In Canny's method, a simple dot product 
test is carried out for all vertices of A against all polygons of B to test for the 
first case, until any vertex of B that is contained in A is found. Suppose a 
polyhedron is composed of k polygons, m vertices, and n edges. On the 
average, the total pairwise tests amount to two (k * m)/2 tests (for Case 1 
and 2) plus n (for Case 3); thus, 0(k * m + ri). As for the one-by-one triangle 
piercing test, 0(k * k) comparisons have to be made on the average (if 
objects are made of k triangles). For low-count polyhedra, the one-by-one 
triangle piercing tests can be carried out faster than Canny's method plus it 
does the collision determination. Such a situation can arise if we compare 
two hierarchically organized bounding volumes where the low-level primi- 
tive (triangle piercing) test was done between two very low-count polygonal 
entities. Note that the collision determination for two polygons would 
involve much more computation (not derived in this book). 



Polygonal Objects Versus Polygonal Objects II [M0088] 

A slightly different method of polygon versus polygon was proposed by 
Moore and Wilhelm [M0088] that tests if the edges of one polyhedron pierce 
through the faces of the other. Given polyhedra P and Q, we check if edge 
VjVj of Q intersects with infinite planes that contain the faces of P. If the 
perpendicular distances from v,- and vj to the infinite plane change sign, then 
the edge pierces through the infinite plane, and that point along the edge can 
be computed as follows. 

Uk- vertex of the polygon p (which would be on the infinite plane) 
dj = (v, — • «/ c (plane equation with respect to v ( ) 
dj — ( v j — u k) ' n/c (plane equation with respect to v,-) 
Point on line: x = vj + ?(v,- - vj) 
t=\d,\/(\d,\ + \dj\) 

Thus, according to the above formulation, we would obtain a series of ts for 
one edge against the number of infinite planes. Any values beyond 0 and 1 
should be discarded, and the mid-point of the intersections (represented by 
the ts) is checked for insideness (using the dot product; see Figure 9.13). 
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Figure 9.13. An edge piercing through a polyhedron. 

Voronoi Region Method [Lin95] 

A method called the Voronoi region method was proposed by [Lin95]. It has 
been implemented as a public domain package called the V-Clip collision 
detection/determination (it has been evolved and optimized over the years 
since it was first available in 1994). This is known to be one of the fastest 
methods (almost constant time on the average) of collision detection/deter- 
mination, but because of the mathematical complexity of the algorithm, we 
only sketch the principle of how it works, "the closest pair of features." In 
this method, the concept of a Voronoi region of a feature is used. A feature 
of a polyhedron refers to the vertex, edge, and face of the polyhedron. 
A Voronoi region of a given feature refers to the set of points that are closer 
to the feature than any other feature of the polyhedron. In Figure 9.14, for 
instance, the space formed by the infinite planes around the edge e of the 
cuboid object constitutes the Voronoi region for feature e. The Voronoi 
region for the vertex v of the pyramidal object is also illustrated in the figure. 
There is a mathematical theorem that states that if the Voronoi regions of 
two features of two objects overlap, then they must be the closest features 
between the two objects. Thus, in Figure 9.14, the edge e of the cuboid and 
the vertex v of the pyramid must be the closest features between the two. 
Once this is known, the interference between the two objects is a matter of 
checking the "distance" between the two closest features. 

Thus, the algorithm must first start searching for the closest pair of 
features between two polyhedra, and at the worst case, this involves a full 
pairwise test among the features and checking for the overlappings of their 
Voronoi regions. But usually heuristics can be used to make good guesses 
and they can be found quite fast. Once a closest feature is found, assuming 
that the objects move relatively by small increment with respect to the 
simulation tick and with respect to each other, it can be assumed that the 



166 9. Simulation I: Collision Detection 



Voronoi region of e 




"Edge e and vertex v are closest feature" 
Figure 9.14. Overlapping Voronoi regions of two polyhedra. 

closest features will not change abruptly. In the case where the closest feature 
pair is no longer valid, the new closest feature pair should involve features in 
the close neighborhood, and thus can be found very fast (see Figure 9.15). 

Bounding Volumes 

A bounding volume (BV) of an object is another object that completely 
encloses the given object. In the context of collision detection and determin- 
ation, bounding volumes are usually simple geometric solids, such as rect- 
angular boxes and spheres, that would have much fewer polygons than the 
object they represent. Thus bounding volumes approximate the shape (or 
spatial occupation) of a given object so that the needed number of low-level 




i 
i 
i 



Figure 9.15. Changing closest feature pair as the objects move. 
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primitive tests (e.g., polygon vs. polygon, triangle vs. triangle, etc.) is re- 
duced (see Figure 9.16). 

This might work for many applications where inexact collision detection 
suffices. However, for more intricate manipulation of objects and motion 
representation, this would be an overapproximation. A better, yet not so 
computationally burdening, method would be to organize the object as a 
hierarchy of bounding volumes (see Figure 9.17). With hierarchically organ- 
ized bounding volumes, two objects can be checked for intersection in a 
recursive manner, either to reject the possibility of the interference quickly or 
to zoom down quickly to a local portion of the respective objects and then 




Figure 9.16. Approximating objects with different styles of bounding volumes. 
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test for exact collision determination between the respective regions of 
interests. 

For instance, given two objects A and B, which are represented by a 
hierarchy of bounding volumes, we first check if the topmost bounding 
volume intersects (let's call this At, v o and Bt, v o)- If these bounding volumes 
do not intersect, then there is no way the objects they represent can interfere 
with another by the definition of bounding volumes (BVs always completely 
contain the objects). If the volumes intersect each other, there is a possibility 
that the objects may intersect, and we go down the hierarchy for further 
checks. We check interference between the BV of one of the objects (for 
instance, At, v o) against the children BVs (Bb v n, i = 1, 2, . . . , n) of the BV of 
the other object (Bb v o)- One of these n children of B (i.e. Bbvii ~ Bb v i«) must 
intersect with the topmost BV of the object A (Ab v o)- This way, the part of B 
that might intersect with A is identified (let's call it Bb v u, where k is between 
1 and n). This child BV of the object B (Bb v i/fc) can be tested for the part of A 
(one of the children BVs of object A, Ab v i,) in collision with object B. This is 
done by checking for interference between the child BV of object B (i.e., 
Bbvit) and children BVs of A (i.e., Abvi/, i= 1,2,..., ri). This process con- 
tinues either until there is no interference found among the child BVs, or 
until the tree exhausts its leaf nodes. The leaf node of the object represents 
part of the object that might be in collision, but cannot be decomposed to 
any further subbounding volumes. Thus, if two leaf nodes are interfering 
with each other, one final test remains: applying the interference test among 
the primitives (such as triangles or polygons) contained in each of the leaf 
node bounding volumes. The process is illustrated in Figure 9.18. We can 
readily see that between two objects with k polygons, the computational 
complexity comes down from 0{k * k) (pairwise test) to O (log k + 1 * 1), 
where 1 is the number of polygons contained in a leaf node, a very low 
number, much less than k). 
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Object Collide (A, B) 
If (not bv-overlap(Abv, Bbv) return talse 
Else if (isLeaf(A)) 
If (isLeaf(B)) 
For each triangle pair between contain 
If (pierce(Ta, Tb) return true 
return false /* if no triangles pierce "I 
Else /* B is not a le */ 
For each child Cb of B 
ObjectCollide (A. Cb) 
Else /' A is not a leaf"/) 
For each child Ca of A 
ObjectCollide (Ca, B) 
return false 

Figure 9.18. The recursive algorithm for collision detection/determination with 
hierarchy of bounding volumes. (Adapted from [Pal95].) 



With this basic formulation, we now look at the three remaining issues in 
implementing a collision detection/determination procedure based on hier- 
archical bounding volumes. The first issue is how to build various types of 
bounding volumes for a given object, the second, how to construct the 
hierarchy of bounding volumes for a given object, and finally, how to test 
interference among the bounding volumes (before reaching the leaf node, or 
overlap (A hv , B hv ) in Figure 9.18). 

Building a Bounding Volume 

There are many types of bounding volumes. The popular ones are the axis- 
aligned bounding box (AABB), spheres, oriented bounding box (OBB), and 
K-DOPs. AABBs are boxes whose faces have normals that coincide with the 
standard basis (e.g., world coordinate axis). OBBs are boxes whose faces 
have normals that are pairwise orthogonal (but oriented properly to give a 
tight fitting), and K-DOPs are geometric entities made of a K/2 set of slab 
pairs (e.g., parallel plates with pre-fixed directions; see Figure 9.16). 

AABBs can be constructed easily by identifying the extreme values (min- 
imum and maximum) in the respective directions of the basis vectors, 
*max, x min , y min , y max , z min , and z max , and building the box with eight 
vertices. AABBs are simple and can be made quickly on the fly (whenever 
the object changes its orientation, or in the worst case, at every simulation 
tick), but because the orientation of the box is fixed, it may not be 
very tight fitting. This can cause inexactness (if only one bounding box 



in A and B 
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was used) or waste of computation time (if a hierarchical AABB was used) in 
the collision detection/determination procedure. As shown later, checking 
for collision among two AABBs is also quite simple and fast. 

Spheres also present a similar case. A bounding sphere can be defined with 
radius value such that the sphere encloses the AABB of the object. Finding 
the appropriate 2 center/radius takes a bit more calculation than just com- 
puting for an AABB. However, checking for two spheres in collision is very 
simple (measure the distance between two centers of spheres), and spheres do 
not need to be reconstructed as objects change their orientation. Spheres are 
in general even less tight fitting than AABBs, and thus introduce further 
inexactness or waste of computation. 

The normals (or orientations) of the faces of the OBBs are determined by 
the shape of the object in order to find a tight-fitting box around the object. 
Thus, it is more difficult and time consuming to construct it, however, once 
constructed, it is not necessary to build it again (whenever the object moves 
or rotates, the OBBs follow). Because it has a tighter fit, it causes less 
inexactness (if only one OBB was used) or less waste in computation time 
(if a hierarchy of OBB was used) in the collision detection/determination 
procedure. However, checking for collision among two OBBs is a bit more 
time consuming than that of AABBs. 

To construct an OBB of a given object, we must find three local orthog- 
onal axes that constitute the orientations of the faces of the OBB, and 
reasonably tight-fit the object (see Figure 9.19). A good candidate is the 
eigenvectors of the covariance matrix of the vertices of the object. Comput- 
ing for the covariance matrix yields the direction along which x-, y-, and z- 
coordinates most often tend to vary together (i.e., the direction along which 




Figure 9.19. Constructing an OBB. 
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x,y,z are most correlated). At the same time it reveals the direction along which 
axes are least correlated (which is perpendicular to the first). These directions 
represent the directions of the most and least elongation of the object [Got96]. 
To reduce the unnecessary influence of concave vertices (if the object was 
concave) toward computing the elongation directions, we can compute the 
convex hull 3 of the vertices of the triangles. This takes 0(n log n) time. 

Now let the i th triangle of the convex hull have the vertices p h q h and r h 
the number of triangles in the convex hull n, and the area of the f triangle 
A t . A H is the surface area of the convex hull, which is just the sum of all 
areas of the triangular facets of the convex hull. 

A t = \/2\{pi - q,) x (pt - n)\ A H = Y1 A ' 

i 

The centroid of the z' th triangle m h and the centroid of the entire convex hull 
m H , are given by 

m = {pt + qi + r,-)/3 m H = ^ Aimj/A H 

i 

The entries of the covariance matrix Q is given by the following formula 
(for details, see [Got96]). 

dj = J2 (Ai/l2A H )[9m^ + q^q] + ^] - mmj 

i 

The eigenvectors of the covariance matrix C are orthogonal, and form the 
axes of the OBB. 

Once the axis vectors are known, we can compute the maximum and 
minimum extents of the original triangle set along each axis and obtain the 
vertices of the OBB similarly to constructing the AABB. The computation of 
the covariance matrix C takes at most linear time, and getting the eigenvec- 
tors from C is a constant time operation. Computing the box coordinates is 
also a linear time operation. So, the convex hull operation dominates the 
overall procedure, which is 0(n log ri) time [Got96]. The procedure can be 
made faster by skipping the convex hull step, and just using the triangles in 
the original triangle set, resulting in a linear time procedure, but this yields 
boxes that can have very bad fits. 

K-DOPs are a generalization of the AABB. The directions of K-DOP 
faces are fixed but do not necessarily coincide with those of the principal 
axis, and there may be more than three directions for the faces (thus it is not 
necessarily a box any more but 2«-sided volumes, where n is kept reasonably 
low). A K-DOP of an object can be defined with the normal directions of the 
faces and the extreme points (min and max) along those directions where the 



3 Informally, a convex hull of an object is the smallest convex polyhedron that 
encloses the object. Algorithms for finding a convex hull can be found in the 
introductory text books for computational geometry. 
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Slab 




projected vertices and 
find max & min. 



Figure 9.20. Constructing a K-DOP. 



faces will lie. The extreme points are found easily by projecting the vertices 
of the object onto the respective normal direction of the faces and looking 
for the minimum d im ; n , and the maximum di max for i = 0 to KI2 (each face set 
or slabs as they are called; see Figure 9.20). The AABB would be a special 
case of a 6-DOP. K-DOPs offer tighter fit than AABBs, but require more 
calculation to construct, plus like AABBs, they must be reconstructed when- 
ever the object changes its location or orientation. 

Building a Bounding Volume Hierarchy 

Building a hierarchy of bounding volumes follows these basic steps in a 
recursive manner. 

1. Build a BV of the given object. 

2. If the BV contains more than a preset threshold of geometric primitives 
(such as triangles, polygons, vertices), split the BV into k children by some 
criterion. Recurse above for each child BV. Otherwise, stop. 

The most straightforward criterion for the way the split is made is to split 
the bounding volume in half (or into n equal subspaces). The only requirement 
would be that the k subspaces collected together entirely encompass the 
original object, and preferably not overlap each other too much (to avoid 
redundancy). Figures 9.21 and 9.22 show the hierarchy construction process. 
For symmetric bounding volumes such as the spheres, boxes, and ellipsoids, it 
is easy to decompose the volume into half or n equal subspaces. For irregularly 
shaped bounding volumes such as the K-DOPs, a heuristic criterion often 
used is cutting in half along the x-, y-, or z-axis (the principal axis), whichever 



Testing Interference Among Bounding Volumes 173 





Figure 9.21. Constructing an AABB tree and a sphere tree. 






Figure 9.22. Constructing an OBB tree [Got96]. (Reprinted with permission from the 
ACM © 2004.) 



minimizes the sum of the subvolumes, or cut along one of the slab directions 
whose distribution of projected vertices is the largest [Mol02]. 



Testing Interference Among Bounding Volumes 



Ray Versus Spherical BV 

Between a spherical bounding volume and a ray, we can make a figure such 
as that shown in Figure 9.23. The ray may meet with a sphere at zero, one, or 
two points. The radius of the sphere is equated to the distance between the 
intersection point and the center: 
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Figure 9.23. Ray versus sphere and its application. 



R2= \o + td-c\ 

This results in a simple quadratic equation and can be solved for the values 
of t. If there is no real valued solution for t, then the ray does not meet with 
the sphere. 

Ray Versus Box ( or Slab ) [ Mol02 J 

To check intersection between a ray and a box (or a slab), we check the 
intersection between an infinite slab (composed of two infinite and parallel 
planes that comprise one dimension of the box) and a ray (see Figure 9.24). 
The two infinite planes are represented by 

a u 'X = dl and a u -x=dl+L, 

where L represents the (perpendicular) distance between the slab and a u is 
the normal vector of the slab. We can re-represent the same equations with 
the origin translated to a c , where a c is the centroid of the box, as 

a u 'x' — L/2 a u 'X l — —L/2 




Figure 9.24. Ray versus slab. 
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We equate with x', the equation of the ray, but represented with the origin 
translated to a c . Thus the equation of the ray becomes o + td +p, where 
p = a c — o, and thus, a c + td. 

a u '(a c + td) = L/2, a u '(a c + td) = —L/2 and t\ ! 2 = (ci u 'Ci c ±L/2)/a u 'd 

Once you find the values of t with respect to all the dimensions of the box/ 
slabs against the ray, they can be checked against the valid range that defines 
the original line segment to finally determine whether the line segment 
penetrates the slab volume. 

Sphere Versus Sphere 

Checking interference between two spheres is simple: we compare the dis- 
tance between the two centers (c2 — cl) to the sum of their radii (rl + r2). If 
the former is greater, then there is no overlap. 

Sphere Versus Box 

Although rare, one might have a set of objects that are made of different 
types of bounding volumes, for instance, one with sphere(s) and the other 
with box(es). To test interference between a sphere and an AABB, the 
following simple steps can be used. 

1. Test if the center of the sphere is within the AABB bounds (i.e., test if r t is 
between a; m ; n and a imax for all i, where i = x,y,z). 

2. If the center is inside the AABB, there is collision, otherwise calculate the 
distance from the center of sphere to the x-, y-, and z-axis. If they are all 
less than the radius of the sphere, then there is interference. Otherwise, 
there is no interference. 

AABB Versus AABB 

Checking for interference between two AABBs is also quite simple, requiring 
few comparisons among the extreme points of the respective AABB (see 
Figure 9.25). Given two AABBs of object A and B, let's call a,- the coordinate 
value of AABB of A in direction i, and similarly for Z> ; . For the two AABBs 
to be disjoint, it suffices to check if a !m i n is greater & (max or b imm is greater 
than 

climax for any i — x,y,z. If the above condition is true for any i, then the 
AABBs are disjoint, and otherwise, there is an overlap. 

OBB Versus OBB [ Got 96 J 

The interference checking among two OBBs is more involved and compli- 
cated, however, it requires a relatively small amount of computation. It uses 
something called the separating axis theorem [Got96]. The separating axis 
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Figure 9.25. A ABB versus AABB. 



theorem states that for any two arbitrary convex disjoint polyhedra A and B, 
there exists a "separating" axis on which projection of the polyhedras, which 
form intervals that are also disjoint. 

Furthermore, it implies that, if A and B are disjoint, then they can be 
"separated" by an axis that is orthogonal to faces of A, faces of B, or edges 
from each polyhedron (see Figure 9.26). For three-dimension OBBs, there is 




(a) 



(b) 

Figure 9.26. Two polyhedra and separating axis: (a) disjoint and (b) overlapped. 



Testing Interference Among Bounding Volumes 177 




Figure 9.27. Checking for overlap between two OBBs. L is given (one of the 15 
possible directions of separating axes to check for), T is the vector between the 
centroid of the two boxes, and TL is projection of T onto L. What needs to 
computed are r A and r B . r B is the projection of a 1 A 1 and a 2 A 2 onto L and likewise 
for rB. a 1 is the half length of box A in the ith direction of the box, A 1 . A check is made 
to see if r A and r B overlaps. (Reprinted from [Got96] with permission from 
ACM © 2004.) 

a total of 1 5 cases to test for: 3 separating axes that are orthogonal to faces 
of A, 3 that are orthogonal to faces of B, and 9 that are orthogonal to edges 
whose directions are determined by the cross product between one edge 
direction from A and the other from B (total 3x3 combinations). 
If, among the 15 cases, a disjointness is found on the projection of the 
polyhedras, we can conclude that the polyhedra themselves are disjoint 
also. The formulations necessary to test these are shown in Figure 9.27. 

KDOP Versus KDOP 

As K-DOPs are a generalization of the AABBs, so are the tests for overlap. 
For all pairs of slabs between K-DOPs A and B, Su and S t B (i is the 
direction of the slabs and goes from 1 to KIT), if at any time, the intervals 
of SjA and Sw do not overlap, then the whole K-DOPs A and B are disjoint. 




Figure 9.28. Interval testing in one direction i between two slabs. 
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In other words, for each i that belongs to (1 to KIT), if df min is greater than 
^itnax or d f min is greater than df mSLX , then the objects are disjoint, dj- is the 
coordinate of the slab i in the direction perpendicular to slab direction / If 
there are interval overlaps for all the directions i, then there is an overlap 
between the two K-DOPs (see Figure 9.28). 

Summary 

As illustrated in the various methods, one must consider the nature of the 
virtual environment to choose the most suitable collision detection method 
in terms of desired accuracy and available computational resource. In gen- 
eral, the total cost of the collision detection process can be represented as 

T = n v * c v + n P * c P + n v * c v 

where n v , n P , n v represent the number of bounding volumes, polygons, and 
bounding volume updates required in a given method, and "c"s represent 
the respective costs for testing or updates. We have seen that reducing n v , 
for instance, generally may increase n P , or eliminating n v (by choosing 
spherical bounding volumes) can increase ny- Different styles of bounding 
volumes have different costs for tests and updates as well. Thus, it is neither 
meaningful nor possible to rate which method is the best. 

Pondering Points 

• Try to make a more detailed comparison between the interval method and 
Canny's method for collision detection between two triangularized ob- 
jects. Note that for triangularized objects m = 3. 

• How can we modify the method of vertex piercing (Canny's method) to 
consider collision between concave objects also? 

• Propose a way to estimate the point of collision. For instance, if there is 
one vertex that penetrated a particular face, that vertex position might be 
used as an approximation of the contact position assuming that it is not 
too far penetrated from the colliding surface. What if there were two or 
multiple numbers of penetrations? What about face-to-face contacts? 

• Although this is part of the collision response problem, how can we 
determine the bounce direction upon collision? One simple method is to 
compute the reflecting direction of the incoming object moving direction. 



Chapter 10 

Simulation II: Physics-Based Motion 
and Collision Response 



In this chapter, we cover the basic knowledge required to implement realistic 
motion simulation of virtual objects based on simple physics. Realistic 
motion constitutes a big part of object behaviors and its realism affects the 
user-felt presence. Motion simulation is also essential in implementing col- 
lision response behaviors. 

The study of motion starts with the recognition that motion is generally 
categorized into two types: linear and rotational. Likewise the motion pro- 
files are dependent on two inherent properties of the object: its mass and 
moment of inertia. Mass is the measure of the object's body's resistance to 
linear motion, and the mass moment of inertia is the measure of the body's 
resistance to rotational motion. Mathematically, mass is the material density 
multiplied by the infinitesimal volume over the whole object (m = J v p dv). 
Usually these properties must be known in advance to make any motion- 
related computation. It may be necessary to recompute the moment of 
inertia for different rotational axens, or due to changing composure of the 
object (the object being composed of moving subobjects; see examples later). 

Center of Gravity (COG) 

For the sake of simplicity, objects are often treated as point mass, and the 
point in the object that represents the concentration of the object's mass is 
called the center of gravity (or centroid, center of mass). Usually the center 
of gravity is contained within the object, but there are exceptional cases. 
The Center of Gravity (COG) is computed as follows. For instance, the 
x-coordinate of the COG is computed as the sum of the multiples of COGs 
of each constituent subobject and its masses, divided by the total mass. The 
total mass is simply the sum of all masses of the subobjects (or constituent 
objects) making up the whole object. The formula is similar for the y- and 
z-coordinates of the COG (note that the COGs and mass properties of the 
subobjects would have to be provided). 

COG = {S.icgdmil/m, 
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where cg t and m, are center of gravity and the mass of the ith constituent 
object respectively, and m t is the total mass. If the object was not represented 
as a collection of smaller objects, then computing its COG involves the use 
of a formula which is the generalization of the above concept in the con- 
tinuous domain. The formula integrates over the whole volume of the object. 
If the object's shape is rather complex and not well represented, the compu- 
tation would be difficult. For simple-looking symmetric geometries, COGs 
can be computed easily from the existing formulas [YouOO]. Also note that 
the COG is a property local to the object and if the object moved or rotated, 
its global coordinates would change. 

Moment of Inertia 

To compute the moment of inertia, one needs to take the second moment of 
each subobject mass making up the body about each coordinate axis. The 
second moment is the product of the mass and the perpendicular distance 
from the subobject mass centroid to the rotational axis, squared. In general, 
we can draw a figure as shown in Figure 10.1, where O is the COG of the 
object, and n is the rotational axis. 

I = %m i (R i ) 2 = %m i (\r i \sinf) i ) 2 

= I xx cos 2 a + I yy cos 2 (3 + I zz cos 2 u + 2 I xy cos a cos (3 + 2 I yz cos (3 cos ct 
+ 2 I zx cos a cos a 

where a, [3, a are angles made between r, = (x,-, y u z,) and the x-, y-, and 
z-axes, and 

I zz = Z(x 2 +y 2 T{m i ) 




Figure 10.1. Calculating the rotational moment of inertia. 



Moment of Inertia 1 8 1 



and 

Ixy = lyx = -%{xiy x )*(mi) 

Ixz = Izx = -2(x,z,)*(m0 

lyz = Izy = -£(z ; .}>,)*(™;) 

For instance, the moment of inertia with respect to the x-axis would be just 
(because the a = 0, (3 = 90, a = 90): 

I = I xx = X(yt + z>y( mi ) 

However, if the rotational axis was an arbitrary vector going through the 
COG, computing the / would involve all six components, namely, 
hx, lyy, Izz, hy, lyz, and I zx . We can conveniently represent the inertia with 
its components in a 3 x 3 symmetric matrix form, and this is called the 
inertia matrix. 



I XX 


Ixy 


Ixz 


lyx 


lyy 




hx 


Izy 


Izz 



Computing the moment of inertia will require information about the mass 
properties of the subobjects. Again, similar to the case with computing the 
COG, if the object was not represented as a collection of smaller objects (i.e., 
just a single object), then computing its COG would be more complicated. 
A continuous integral must be solved over the whole volume of the object 
and if the object shape is complex and not well represented, the computation 
will be difficult. For several simple-looking symmetric geometries, simple 
formulas for the mass moment of inertia exist (the product of inertia terms 
such as I xy , I yz , and I zx go to zero) [YouOO]. Also note that the moment of 
inertia is a property local to the object and relative to the designated axis 
of rotation. Assuming that the axis of rotation goes through the COG of 
the object and the local coordinate system is also placed at the COG, 
the local inertia matrix will remain constant. However, as the object moves 
and changes its location and orientation, the global inertia matrix (i.e., 
inertia matrix computed with respect to the global coordinate system) will 
change. In fact, the global inertia matrix only changes with object rotation 
(not translation). Initially, assuming that the global coordinate system 
and the local coordinate system's orientations coincide, the global and 
local inertia matrices are also the same. The relationship is expressed as 
follows. 

w I(t) = w R 0 (t) °I w R 0 {t) T 

where w R 0 {t) is the orientation matrix between the fixed-world coordinate 
system and the moving object coordinate system. 

We illustrate the use of the formulas with the following example. A car is 
composed of several subobjects; a body, driver, and four wheels with the 
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Table 10.1. Mass Properties of the Subobjects of the Car 





Body 


Driver 


Wheel 1 


Wheel 2 


Wheel 3 


Wheel 4 


Length 


120 


10 


8 


8 


8 


8 


Width 


50 


10 


3 


3 


3 


3 


Height 


40 


20 


8 


8 


8 


8 


Weight (kg) 


1000 


60 


5 


5 


5 


5 


Centroid 


150,125,20 


180,125,20 


100,150,0 


200,150,0 


100,100,0 


200,100,0 



following physical properties (see Table 10.1). All the subobject's shapes are 
approximated as rectangular boxes (see Figures 10.2 and 10.3). Here the 
objective is to compute the total moment of inertia of the composite object 
around its z-axis on its centroid. 

First, we compute the COG of the car from the six subobjects. 

m t = 1000 + 60 + 5 + 5 + 5 + 5= 1080 
^COG car = Xicgdmi/mt = (151.7, 125, 19.63) 

Note that the COG of the car just computed is in global coordinates. 
Assuming that the global and the local coordinate systems of the subobjects 
are aligned, the local COGs of the subobjects (relative to the local coordinate 
system of the car) are computed easily by subtracting the global COG of the 
car from the global COGs of the subobjects (as listed in the table). 
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Figure 10.3. The side view of the car with six subobjects; body, driver, and four 
wheels. 



Body's local COG 
Driver's local COG 
Wl's local COG 
W2's local COG 
W3's local COG 
W4's local COG 



car (- 1.67 0.0 0.37) 
car (28.3 0.0 0.37) 
car (- 51.67 25.0 - 19.63) 
car (48.33 25.0 - 19.63) 
car (- 51.67 -25.0 - 19.63) 
car (48.3333 - 25.0 - 19.63) 



Now, we can compute the moment of inertia matrix. For instance, 

ca 7 x . T = S(j2 + z 2)*( m ,.) 

= (0 2 + 0.37 2 )* 1000 + (0 2 + 0.37 2 )*60 + [(25 2 + 19.63 2 )*5]*4 
= 20352 

And, because initially, the global and local axes are aligned (assumption; 
i.e., w R{t) 0 = I), the global inertia matrix and local inertia matrix are the 
same. 



Wi 
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Similarly, 




I vv = 108852 



I zz = 113500 




I zx = 666.67 



Suppose that the car rotated by 90 degrees with respect to the z-axis of the 
world coordinate system, and let's say the new position of the car was 
expressed by 



New COG of the body = R(t) (150, 125, 20) = "7-125, 150, 20) 
New COG of the driver = R(t) (180, 140, 20) = ^(-125, 180, 20) 
New COG of the wheel 1 = R(t) (100, 150, 0) = "7-150, 100, 0) 
New COG of the wheel 2 = R(t) (200, 150, 0) = ^(-150, 200, 0) 
New COG of the wheel 3 = R(t) (100, 100, 0) = ^(-100, 100, 0) 
New COG of the wheel 4 = R(t) (200, 100, 0) = ^(-100, 200, 0) 

New COG of the car = R(t) (151.6, 125.8) = "7-125, 151.7, 19.63) 



If we recomputed the global moment of inertia for the car with the same 
formula (the local moment of inertia does not change, as the local coordinates 
of the COGs of the subobjects do not change with rotation or translation), 



7 W , = X(^ + Z 2)> i ) 

= (0 2 + 0.37 2 )*1000 + (0 2 + 0.37 2 )*60 + [(25 2 + 19.63 2 )*5]*4 
= 20352 

T^ u I yy (before rotation) 



where 



0-10 
R(t) = w R 0 (t) =10 0 
0 0 1 



Similarly, 



Wt 
1 X 

Wt 
1 l 



XX 



108852 



113500 



w I yz = w l zy = 666.67 
w Ixz = w Izx = 0 

You can check whether "7(f) = w R Q {t) °I w R 0 (tf holds. 
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Kinematics 

In certain situations, it is possible to express motion without regard to forces 
acting on the respective body (e.g., when there is no interobject interaction 
such as collision). Kinematics is the study of motion without regard to the 
forces acting on the body. We first look at kinematics before we actually 
apply the laws of motion (see later sections in this chapter) to simulate the 
dynamics of interobject interaction. In kinematics, we focus on the relation- 
ships among the position, velocity, and acceleration and disregard the effects 
of force (and mass properties of the objects). Kinematics is applied to two 
major object representations: a rigid body, which is an object we assume to 
be nondeforming in its motion, and point mass, which is assumed not to 
occupy any volume (thus rotational effects are ignored). 

Linear Velocity and Acceleration 

This part is applicable to both rigid body objects and particles. (Later we 
treat rotational kinematics for rigid body objects.) Here are the elementary 
kinematic equations of motion. 

Distance traveled s 

Speed v — dsjdt 
Acceleration a = dv/dt 

With a constant acceleration (for instance, gravity), the equations can be 
further expanded (easily) into this form. 

v 2 = v i + a * A? 

v\ = 2 * a * (s 2 - si) + v\ 

s 2 = s\ + vi * At + (a/2)* At 2 

With the motion occurring during a short amount of time, say At, v 2 , s 2 are 
the new velocity and new value of the total distance traveled after A t, and vi , s\ 
are the old velocity and old value of the total distance traveled prior to the 
passage of time duration At (see Figure 10.4). In multidimensions, the above 
equation is usually solved in different component directions (e.g., x, y, and z). 

If the acceleration is not constant (e.g., a = -k * v * t 2 , an acceleration in 
time), then a differential equation must be solved to derive a new set of 
equations for the velocity and displacement. 

Rotational Kinematics (for Rigid Body Only) 

Similar to the linear kinematic equations, here are the relationships among 
angular displacement, angular velocity, and angular acceleration (about a 
certain axis). 
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Figure 10.4. Linear velocities and accelerations. 

w>2 = wi + a * A; 

<ol = w2 + 2*a*(n 2 -Hi) 

H 2 = fli+c^A?+(a/2)*Ar 2 

Here, the i and 2 are the before and after values of the total angular distance 
traveled up to that point (in radians). The wi and <w 2 are likewise the before 
and after values of the angular velocities, and a is the angular acceleration. If 
we are simulating an object in isolation, we simply need to apply these 
equations to compute the linear and rotational motion of the object. How- 
ever, if we were to compute a collision response between two colliding 
objects, then the resulting motion would be dependent on the contact 
point of the collision. In other words, the kinetics of the objects will depend 
on the velocities and accelerations at the point of contact. 

A point on an object will move in a circular path around the axis of 
rotation, when the object rotates (see Figure 10.5). This creates a linear 




Figure 10.5. Rotational kinematics. 
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motion in addition to the linear motion of the body's center of mass. This 
traveled distance (arc length) of the point on an object is computed by 

c = Q*r 

where r is the distance from the center of rotation to the point and D, is the 
angular displacement. Thus the velocity of the point along the path is 
computed by simply taking the time derivative and it results in: 

v, = co x r (to*r in 2D) 

Note that co is a vector with a direction along the axis of rotation. Velocity v, 
as a vector, is tangent to the circular path swept by the point. Another time 
derivative gives 

a t = a x r (a*r in 2D) 

When a particle (or point on a mass) rotates, there is another acceleration 
created called the centripetal acceleration, directed toward the axis of rota- 
tion. This acceleration is given by 

a„ = co x co x r 

Figure 10.5 shows the whole picture. The total linear velocity (resulting from 
rotation) and linear motion occurring at the COG of the object is 

Total linear velocity: v cg + v, = v cg + (co x r) 

Total linear acceleration: a cg + a t + a„ = a cg + a t + (co x co x r) 

The kinematic motion can be created by solving for the incremental linear 
translation and rotation at regular time intervals (simulation tick) and apply- 
ing the appropriate translation and rotation matrix to the object for rendering. 

Laws of Motion 

In order to also consider the effects of forces and torques and create motion 
with dynamics, we need to first understand the basic laws of motion. 
Newton's second law states that the resulting acceleration of a body is in 
the same direction as the resultant force on it, 

XF = m*a 

The above equation is usually solved in different component directions 
(e.g., x, y, and z). 

%F X = m*a x 
%F y — m*a y 
1 i F z = m*a z 

Another important equation of motion has to do with the object's momen- 
tum G, which equals the mass times velocity. In fact, the rate of change of 
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momentum is equal to the total force exerted on the body. Thus, momentum 
is conserved: that is, without introduction of new forces, the momentum will 
stay constant. 

G~m*v dG/dt = m*a = l l F 

If the object is composed of subobjects, the momentum at the COG is 
formulated as the sum of momentum of the subobjects, which equals the 
total mass times velocity. 

G = %m*(dqi(t)/dt) = m*v 

The above equations are valid for translational (or linear) movement qt(t) 
stands for the velocity of the ith subobject. As for rotational movement, 
there is another set of equations. The first relates the total torque on the 
body to the angular acceleration (analogous to 2,F = m*a; see Figure 10.6). 

Torque about COG = r x F 
Total torque = w I(t)*a 

where F is the force acting on the body and r is the distance vector from F 
perpendicular to the line of action of F. w I{t) is the inertia matrix (with 
respect to the world), and a represents the angular acceleration. 

The angular momentum L is given as follows, and is also conserved (i.e., it 
stays constant if no new torque is introduced). 

L = 2(r,(0 x m*(d qi (t)/dt - v(0)) 

= t{ w R 0 {t)h x m*K0 x n(t))) 

and 

Total torque = w I(t)*a = dL(t)/dt u>(t) = w I(ty { L{t) 

where 

x(t): COG of the whole object in global coord. 

v(t): Linear velocity of the whole object at the global COG 
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qt(t): COG of the rth subobject in global coord. 

w Ro(t): Rot. matrix between global and local coord, of whole object 

/,: Local coordinate of the rth subobject 

co(/): Angular velocity of the whole object around the rotation axis 
rj(t): Vector from whole object's COG to rth subobject's COG 

Also note that 

qt(t) = w R 0 (t)li + x(t) 
dqi{i)/dt = w(0 x w R 0 (t)li + v(t) 
n(t) = qt(t) - x(t) = w Ro{t)h 
w I{t)= w R 0 {t)°I w R 0 {t) T 

Even though the angular momentum is conserved with no new torque 
introduced to the system, the angular velocity a>(t) may change due to object 
rotation (i.e., w I{t) changes). 

These equations are quite complicated. Let us go through the following 
example to see how these equations of motion are applied for simulating 
movement. Here is a situation where the car example of Figure 10.1 is 
"pushed" with an initial linear and rotational momentum (or equivalently 
given initial linear and angular velocity values). By applying the above 
equations in an incremental fashion we can compute the car's new location 
and orientation at each simulation tick (see Figure 10.7). 




Figure 10.7. The car starting to move and rotate with given initial values of linear 
and angular momentum (equivalent to saying that the car is given an initial linear 
and angular velocity). 
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Table 10.2. State Data for the Car Object 




State 






Variable 


Description 


Initial Value 


t 


Current time (e.g., tick value) 


0 


x(t) 


COG of the whole object in global coord. 


(151.7, 125, 19.63) From Figure 10.3 


W R,M 


Rot. matrix between global and local 


Identity matrix (global and local 




LUU1U. Ul W11LJ1C ULJJCUL 


coord, orientations are the 






same initially) 


m t 


Total mass 


1080 From Figure 10.3 


"I 


Local inertia matrix 


From p. 184 


h 


Local coords, of subobjects 


From p. 183 


p 


Total linear momentum (stays constant 


(1000, 0, 0) arbitrarily given, 




in this example, no new force) 


velocity toward x direction 


L 


Total angular momentum (stays constant 


(0, 0, 10000) arbitrarily given, 




in this example, no new torque) 


rotation around z axis 



For simulation purposes we keep a record representing the state of the 
whole object (car). The state consists of the information in Table 10.2. The 
purpose of the simulation is to update the x(i) and w R 0 (t) (the new position 
and orientation) of the car. Thus, using the equations of motion, at every 
simulation tick, we compute the state variables listed in Table 10.3. Then, 
finally, to compute the new x(t) and w Ro{t), we compute the dx and dR as 
follows. Here, we use the formula, dR/dt = w(f) x w R 0 (i), that relates the 
angular velocity (w) to the rotation matrix (R), to compute the incremental 
change in R. 

dx = v(t)*At 
x{t) = x(t — 1) + dx 
dR = HO x w R Q (t - \)]*At 
w Ro(t) = Normalize ( w R 0 (t - 1) + dR) 

Note that the results obtained from the incremental update of the w Ro is 
normalized columnwise to keep it an orthonormal matrix (by dividing the 
each column by its norms). Also see Table 10.4. 



Table 10.3. Other State Variables Derivable (at Each Simulation Tick) from 


Variables in Table 10.1 




What to compute 


Description 


How to compute 


"70 


Global inertia matrix 


w R 0 (t - \)°I w R 0 {t - if 


F, T 


Any new force or torque 


For now there is none, but if there were a 






collision, we need to figure this out 


v(t) 


Global velocity 


P{t-\)/m, 


(0(f) 


Angular velocity 


w I{t)-'L(t-\) 


dqi(t)/dt 


Velocity of ith subobject 


v(f) + <o(0 x n(t - l) 






rAt-\)= w R 0 (t-\)h 
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Table 10.4. Traces of the simulation from t = 


0 to t = 3. 






Time 


x(t) 










t = 0 


(151.7 125.0 19.63) 


Row 1 


1 


0 


0 






Row 2 


0 


1 


0 






Row 3 


0 


0 


1 


t = 1 


M 52 fi 1 25 0 19 631 




0.99 


-0.1 


0 






Row 2 


0.1 


0.99 


0 






Row 3 


0 


0 


0.99 


t = 2 


(153.5 125.0 19.63) 


Row 1 


0.98 


-0.2 


0 






Row 2 


0.2 


0.98 


0 






Row 3 


0 


0 


1.0 


t = 3 


(154.4 125.0 19.63) 


Row 1 


0.96 


-0.26 


0 






Row 2 


0.26 


0.96 


0 






Row 3 


0 


0 


1.0 



Dynamics 

Before incorporating forces into our motion equations (for a more realistic 
simulation of moving objects), we first should recognize the two major types 
of forces. One is the contact force that is exerted through direct contact 
(e.g., object collision, holding a book). Noncontact forces, the other type of 
force, acts from force fields on objects without touching the object. The force 
fields are usually uniformly present throughout a large space (such as 
gravitational force or magnetic force). Most usually, we only consider con- 
tact forces in simulating rotational dynamics. Forces from force fields are 
assumed to act only on the COM of the object and thus do not cause any 
rotational dynamics (of course this is not true in reality). 

In considering object dynamics, the first line of business is figuring out the 
major forces acting on the object under consideration. From the information 
about forces, we derive the acceleration and apply the kinematic equations 
(if somehow the accelerations are known, then the problem reduces to just 
the kinematics problem). To summarize, the general procedure for solving 
object dynamics is as follows. 

1. Calculate the body's mass properties (mass, COM, moment of inertia). 

2. Identify and quantify all major forces and moments acting on the body 
(and sum them) and apply tF = m*a and 2M (total torque) = I*a to 
solve for the linear and angular accelerations. 

3. Integrate with respect to time to obtain velocities and displacements. 

4. Check for and handle collision. 

Thus, for Step 2, the newly found forces and torques are updated (third 
row of Table 10.2) and the resulting change in linear and angular momentum 
is updated in the object state (last two rows of Table 10.1). The rest (Step 3 
above) is the same as in computing the kinematics. Thus the major problem 
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in dynamics lies in finding the appropriate forces and torques given infor- 
mation regarding the collision (e.g., collision point). 

If there are multiple objects, then they must be treated one by one, and 
because we are assuming that the above procedure will occur at a very high 
rate with small simulation tick duration, the order of object consideration 
will not matter very much. 



Ad Hoc Collision Response 

A simple nonphysics-based approach may simply generate v(t+ 1) based on 
some heuristics (see Figure 10.8). For instance, the direction of v(t + 1) may 
just be in the reflecting direction of the incoming object. The magnitude may 
be adjusted to simulated energy absorption (\v(t + 1)| = c|v(f)|, and 
0<c<l). 

A similar approach is to introduce a new spring force proportional to the 
amount of penetration upon collision. The direction of the force would be 
normal to the penetrating polygon surface. This method is similar to the 
simple haptic rendering introduced in Chapter 6. This approach is illustrated 
in Figure 10.9. 



The Impulse-Momentum Principle [BouOl] 

The physics-based collision response computation is based on the Impulse- 
Momentum principle. Impulse is defined as a force that acts over a very 



v(t)-v(t)*n 



v(t) * n 




Figure 10.8. A nonphysics-based ad hoc approach to collision response. 



New projectile 




Spring force 

Figure 10.9. Introducing spring force normal to penetrated polygon surface. 
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short period of time. The principle states that (linear) impulse is equal to the 
change in momentum. For rotational aspect, angular impulse is equal to 
the change in rotational momentum. 

Linear Impulse: m(v + — v~) = j Xf dt, 

Rotational Imuplse: I(w + — co~) = jx/ dt, 

where v + , v — ,w + , rmomega-are linear and rotational velocities after and 
before the short amount time (At = t + — t~) the impulse occurs, m is the 
object mass and I is the inertia matrix. In easier terms: 

%f = F = m(v + - v~)/At 
£/ = M = 7(w+ - oT)/Af 

In addition to this principle, another required law of physics is the conser- 
vation of momentum which states that when a system of rigid body collides, 
momentum is conserved. In an equation, 

m\*v\~ + m2*v2~ — m\*v\ + + m2*v2 + 

where, ml and vl refer to the mass and velocity of object 1 along the line of 
action and vice versa for object2. Note that the velocities used in the 
calculation must be the velocities (projected) along the line of action. The 
line of action is the line connecting the COGs of the two objects. Thus if two 
moving objects collided in an angle, the velocity components in the direction 
of the line of action can be obtained by simple projection. Thus, note that the 
"after" velocities computed with the above equation will also be the velocity 
components projected along the line of action. How do we then find the 
actual "after" directions of the objects? This is done by using an assumption 
that there is zero impact along the tangential direction of the collision point 
(direction perpendicular to the line of action), and the velocity along that 
line is preserved after the collision. For instance, if two objects collided head 
on, then the after velocities will be the very opposite (with no tangential 
components). Therefore, there will be two components of the "after" vel- 
ocities, one along the line of action and another along the tangential direc- 
tion. Each of those components can be projected back to the principal axis 
and summed to obtain the final velocity vector (since we know the angles 
among the principal axis, the incoming directions and the line of action). 

When dealing with rigid bodies that rotate, we have to derive a new 
equation for impulse that includes the angular effects. We need to recall 
here that linear velocity is affected by rotation. Figure 10-10 shows the 
situation. The line of action, when angular effects are considered, becomes 
the line perpendicular to the colliding surfaces (rather than the line between 
two COGs). Assuming that the line of action can be found (through the 
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Figure 10.10. Incorporating both linear and rotational kinematics. 

collision determination process), velocities used in the subsequent equations 
are component velocities of the actual projected on the line of action. 

Total linear velocity: v = v cg + v, = v cg + co x r 

where r is the perpendicular distance from the point of contact (where the 
force is applied) to the COM, and <w is the angular velocity. For convenience, 
we drop the subscript "eg". 

For body 1 and 2, the linear impulses are: 

Body 1: Ml*[(vl+ + u>l+ x r\)>n - (vl _ + wl" x rl)««] = F*At = J 
Body 2: M2*[(v2+ + w2+ x rl)>n - (v2~ + u)2" x r2)-«] = -~F* At = -J 

For these impulses, there is one contact force, F (over a short time ), and 
from the body l's perspective, it is F, and from body 2's perspective it is —F. 
The rotational impulses are: 

Body 1: /l(wl+ - wl") = rl x (F * At) = rl x J 
Body 2: 72(w2+ - w2~) = rl x ( - F * At) = rl x -J 

In addition to these equations, we can implement a heuristic approach to 
account for the actual loss of energy when collision occurs. We establish a co- 
efficient of restitution, e, to represent the degree of "elasticity" of the collision. 

e=(vl+-v2+)/(vl--v2-) 

The higher the value of e is, the more elastic the collision is and that means there 
is no loss of energy in the collision. The lower the value of e is the more inelastic 
the collision is and that means, at extreme, two objects will simply stick together 
(and do not bounce off absorbing all incoming energy). Different e values can 
be used depending on the material properties of the objects in question. 

Collision Response 

When a collision is detected, the resulting velocities should be updated 
according to the physical principles that govern the preservation of momen- 
tum. They were: 
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Body 1: Ml*[(vl + + wl + x rl).«- (vl _ + wl" x rl)-n] = F* At = J 
Body2: M2*[(v2+ + w2+ x r2)-«-(v2" + w2" x r2>«] = -F* At= -Jn 
e = (vl+ - v2+)/(vr - v2") 

Remember that the total velocity equals v + coxr. Also remember that the 
equations must be applied along the line of action (denoted as vector n). 
Disregarding the rotational effects for the time being (w x r = 0), upon 
detection of collision, we compute / as: 

J=-[(v\- - v2-)-n](e+ 1) (1/M1 + 1/M2) 
vl + -n = vl~-« + //Ml 
v2 + «« = vT*n -J/M2 

The actual vl + , v2 + are found in the same as explained in the previous 
section using the assumption that there is zero impact along the tangential 
direction of the collision point (direction perpendicular to the line of action), 
and the velocity along that line is preserved after the collision. 

If we add the rotational effect, the collision detection module must pro- 
duce the location of contact, i.e. rl and r2 (we will not cover how to do this 
in this book in detail, although some techniques in the collision detection 
section can be used to approximate this location), and the impulse / can be 
computed as (derived from the original set of equations): 

J = -Rvl" - v2"). n](e+\){\/m\ + 1/ml + n- [(rl x n)/Il] x 
rl + «• [r2 x n]/I2} x rl) 

Once J is computed, the v's and co's after the collision are computed with the 
formulas presented earlier. 

vl + - n = vl~- n + J/Ml 

v2+ • n = v2~ • n - J /Ml 

col+ = wl" + (rl x J)/I\ 
w2+ = w2- + (rl x -J) / II 

Real-Time Simulation Revisited 

Now that we have covered the basic physics model, we turn to how to use it 
and implement a real-time simulation for virtual reality. Recall that there 
were two major computational structures for a VR program, one based on a 
single thread of infinite loop that handles the external input, does the 
simulation, updates the environment, and renders the scene in sequence, 
and the other, a distributed approach in which each of those components 
would run as separate threads in synchrony. Either way, the simulation 
module applies the physics model, described in the previous sections, once 
at some small increments of time (simulation tick). Also remember the 
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general procedure for solving object dynamics. Once an acceleration of an 
object is figured out, the next line of business is to apply integration to 
obtain object velocity and displacements. This small increment of time (dt) is 
the basis for our calculation of the object motion that involves differenti- 
ation or integration of quantities with respect to time. Note that the At will 
be the current time minus the time stamp of the last time of simulation 
calculation, thus it must be tracked using a system timekeeping function. 

We have already seen this style of motion kinematics simulation in the last 
part of the kinematics section. As for the motion dynamics, the only differ- 
ence is to figure out the new acceleration (or velocities) resulting from the 
collision using the impulse-momentum equations (or ad hoc methods). 

Between one increment of small simulation tick, we can write: 



dv = a dt and 



a dt(over small time duration At) 



If we replace dt with At, then dv = a* At, and the new velocity is: 

New velocity: v(t + At) = v(t) + a* At 

The new displacement value, after the increment of delta t, will be: 

s(t + At) = s(t) + At*v(t + At) 

Even though the process is simple, it is an approximated way of carrying out 
the integration of the acceleration over the short period of time to obtain 
velocity and displacement. This is called Euler's method and is based on the 
Taylor expansion theorem that states: 

y(x + Ax) = y(x) + Ax*y'(x) + Ax 2 y"(x)/2\ + high order terms 

If we ignore the third and rest of the terms in the right-hand side of the 
equation and replace y with v, and Ax with At, we arrive with the Euler 
approximation of integration. If we only use up to the second term (one that 
uses /), that amounts to an integration in the linear sense, approximating y 
with a linear motion model. We can certainly choose to include more terms 
to reduce the error and produce a better approximation of the y. But it 
requires more computation. 



Deformation (As a Result of Collision) 

So far, we have assumed that the colliding bodies were rigid, and did not 
deform upon collision. Here, we illustrate two methods for simulating a 
simple deformation effect. Deformation of an object modeled as a mesh 
amounts to moving the vertices near the point of contact. A simple form of 
deformation is illustrated in Figure 10.11. The upper part of the Figure 10.11 
shows the effect of moving one vertex toward the right. Vertices near by are 
attracted toward the right depending on their distances from the original 



Summary 197 




Figure 10.1 1. Deformation by moving a group vertex proportionally. Upon collision 
the vertices of the colliding objects can be moved in a similar way. 



vertex that moved. Thus, when a collision occurs, the vertex near the point of 
collision can be moved proportional to the penetration distance and the 
vertices near by can be moved in a similar way. 

There are a number of other more elegant methods of handling deform- 
ation. For instance, the motions of clothes or drapes are simulated by 
modeling the clothes as a network grid of particle masses connected by 
springs and dampers. The motion of the particles can be computed approxi- 
mately by solving the dynamics equations for each pair of particles in the 
grid in some predefined order. 

Summary 

Many typical behaviors of virtual objects are manifested by motion. Thus, 
modeling motion as realistically as possible can be important in making the 
virtual environment more believable. The motion simulation realism may 
also be important for creating the right virtual environment for correct 
training and education. Physical motion involves both linear and rotational 
components. However, including rotational components can make the mod- 
eling process more difficult and it involves more computation. Depending on 
the given situation, the rotational effects may be ignored. Most often, full 
physically based simulation may not be needed as well. Just considering 
kinematics or ad hoc dynamics methods may suffice, depending on the 
requirements of the system. 
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Pondering Points 

• One way of estimating the point of contact is just to use the location of the 
penetrating vertex assuming the penetration is small relative to the size of 
the penetrated object. How can we estimate the point of contact for other 
types of collision (e.g., edge going through the face, edge to edge, face to 
face, etc.)? 

• How can we incorporate friction between objects in the dynamics compu- 
tation? 

• What kind of data structure would we need for simulating force fields 
such as winds, magnetic fields, and artificial force fields? 

• It is said that people employ "folk" physics in understanding the world. 
Your folk physics model can be quite different from what actually hap- 
pens in the world. You probably have taken a university or high school 
level physics course some time in your life. Do you really use the know- 
ledge to reason and act in the world? How good is your physical inter- 
pretation of the world? How realistic should a virtual environment be in 
terms of simulating the actual physics of the natural world? Perhaps you 
can adapt a new "folk" physics in a virtual environment. What would be 
the minimum requirement for an acceptable "folk" physics? 

• Simulate the kinematic motion for ships in the Ship Simulator example. 
You can follow the exact same process as illustrated in the text with the 
car example. For the dynamics effect, compute (estimate) the approximate 
force, acceleration, and the change in momentum to compute the new 
response motion profile. 



Chapter 11 

Virtual Characters 



One of the most important objects in VR or in digital content is including 
human (or living/moving) characters. Characters can contribute to making a 
virtual environment more believable. After all, we do live in a world crowded 
with eight billion people. Mechanically, human characters can be viewed as 
an articulated chain of limbs, and in fact, most living creatures or animals 
can also be modeled as articulated chains. These characters are sometimes 
referred to as avatars, meaning that they are to represent the user(s) in the 
virtual environment, but strictly speaking, there can be characters or entities 
that do not represent the users and are acting autonomously (e.g., using AI 
or scripted behavior). In this chapter, we explain how to model characters 
(human or animal) as an articulated chain, and how to move the characters 
(or their limbs and bodily parts) to exhibit certain behavior. Note that the 
character behaviors can be initiated though user control (avatar) or algo- 
rithmically (autonomous agent). 

As already explained, as a virtual object, one must consider the aspects of 
its form, function, and behavior when building a virtual character. Depend- 
ing on the expected functions or behaviors required, the complexity of the 
form may be determined or vice versa. In this chapter, we simply illustrate 
the process of building a humanlike figure with reasonable detail so as to 
carry out various humanlike functions. 

Form of a Character 

There are two major things to consider for a form of a character. Like any 
object, characters (human, animal, or any creatures) are created using the 
computer-aided modelers and they are eventually converted into polygonal 
models. Thus, one must consider how much detail is needed depending on 
the requirements of a given application with respect to the number of 
polygons for performance sake. For instance, the facial details might not 
be very important and a simple paste of textures may suffice. It might not be 
necessary to model the fingers or toes, and the muscular landscape of the 
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body. This mostly has to do with the outer appearance of the character 
model. 

The other modeling issue is related to the functions and behaviors of 
the character, that is, the determination of the control detail. This refers 
to the problem of which limbs or parts should be controllable by the user or 
by the computer. For instance, even though it might be necessary to model 
the arms (for appearance sake), it might not be necessary to move them. If 
arms are not to be controlled (moved), then they can be modeled together 
with the body (torso) as one object. If, on the other hand, the arms must be 
controlled and moved at the shoulder (but not at the elbows), then, one 
might consider modeling the whole arm as one subobject. 

There are mainly two approaches to modeling a character. One is to 
construct a single mesh for the whole body of the character. This method 
ignores the issue of control detail just mentioned above. The outer appear- 
ance detail is determined by the application requirements. The control detail 
is treated separately. Whichever control detail is used, it will become difficult 
to single out and determine parts of the body (e.g., corresponding limbs) to 
be controlled from the single structureless mesh. 

An alternative approach is to model the character as a composite object, 
as a hierarchy of separate limbs (see Figure 11.1). The level and the detail of 
the hierarchy can be set to that of the control detail. This way, controlling 
the character for movement becomes easier, because we know which model 
corresponds to the desired limbs to be moved. 



Motion Control of a Character (Function and Behavior) 

In order to control the motion of the character model (which is a mesh or 
hierarchy of mesh), instead of moving the model data directly, we indirectly 
move them via another control structure, called the skeleton (see Figure 
1 1.2). The skeleton is simply a hierarchy of coordinate systems each assigned 
to a convenient location in the limbs. It is like setting up a virtual bone 
structure where the coordinate systems are set up at the joints of the bones. 
Each of these coordinate systems will constitute a local coordinate system 
for the corresponding limb. To move a limb, we move the corresponding 
coordinate system with respect to the root coordinate system (usually lo- 
cated at the abdomen) or with respect to its neighboring limb coordinate 
system (e.g., upper arm moving relative to the shoulder). The part of the 
mesh that is "bound" to that limb moves together. To summarize the whole 
modeling process, we must do the following. 

1. Model the character as a single mesh or as a hierarchy of limb meshes. 

2. Design a skeleton according to the dimensions of the character model. 

3. Bind coordinate systems in the skeleton to the character model (at a pre- 
determined pose, such as standing in T form). 
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Figure 11.1. A mesh model of a human figure with each limb modeled separately. 

The binding process amounts to establishing a mapping relationship 
between a given coordinate system and corresponding vertices in the 
model. This process is usually done semi-automatically using the modeling 
tools. The modeling tools allow users to construct a skeleton and the user 
can locate the skeleton (or hierarchy of the coordinate systems) onto the 
character model by selecting it with the mouse and moving into and aligning 
it with the 3D mesh model (see Figure 1 1.3). 

One simple method of binding parts of the mesh to the appropriate 
coordinate systems is to use a distance-based algorithm. The algorithm 
classifies the model vertices into different groups based on the "closest" 
coordinate system. The user then can visualize the mapping and further 
manually edit and adjust the binding relationships, if necessary (see Figure 
11.4). This post-processing is often needed because there are cases where 
distance alone cannot determine the corresponding limbs correctly. For 
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Figure 1 1.2. A skeleton for a hierarchical human figure. 

instance, some vertices on the side of the torso can be mapped to the arm 
instead of the torso (because they are closer). 

Using the modeling tools (e.g., 3DSMax'), the 3D model is saved together 
with the skeleton and binding information. However, when using such a 
model in a VR or 3D graphics engine, such information may not be com- 
pletely imported and converted into an appropriate internal data structure 
(it depends on the capabilities of the given engine). For instance, characters 
built with the 3DSMax can be imported into the DirectX 2 environment with 
all the relevant information extracted. Even short character animation se- 
quences can be modeled in 3DSMax, saved, and imported into the DirectX 



1 3DSMax is a registered trademark of Discreet, Inc. 

2 DirectX is a registered trademark of Microsoft Corp. 
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(b) 



Figure 1 1.3. The overall process of building a virtual character: (a) building the mesh 
and the skeleton model; (b) scaling/fitting the skeleton to the mesh model and binding 
parts of the mesh to the skeleton 

for easy invocation of later replay. However, the same may not be true for 
other environments. In many cases, execution environments can only import 
the model data (vertices and faces) and disregard the skeleton, binding, and 
animation data. In such a case, the skeleton and binding (and fixed anima- 
tion) must be specified within the engine by explicit coding. 

The binding relationship is important in producing a natural animation of 
the limbs. The simple "closest coordinate system" algorithm falls short as 
illustrated in Figure 11.4. In addition, when limbs are moved around the 
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Figure 11.4. Wrongly bound vertices. Vertices in the side of the torso are bound to 
the arm rather than to the spine (because of the short distance to the arm). 

joint by a large angle, the limbs can penetrate into the other neighboring 
limb, or more seriously produce a flattening or tearing effect by extension 
and elongation on the outer side of the joint (see Figure 1 1.5). An improved 
binding relationship can be established using a weighted assignment scheme 
where vertices are mapped to multiple limbs with different ratios depending 
on their distance to the joints. 

If the hierarchical model is used, the mapping relationship is clear. The 
mapping can be established using a modeling tool or through explicit coding. 
When a hierarchy of limbs is used instead of a single mesh, each end of the 
limbs will overlap and can produce unnatural looks (see Figure 1 1.6). Differ- 
ent techniques have been developed to overcome these shortcomings [Mae99]. 

Kinematics 

The hierarchical skeleton structure, just like any other composite object, 
organizes the control structure of the character as a tree (see Figure 11.2), 
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(a) (b) 



Figure 11.5. Pitfalls of direct assignment. Simple direct assignment can cause unnat- 
ural self penetration or torn mesh (a). By assigning weights of influence for the 
vertices near the joints, such situations can be avoided to some degree (b). 

where the coordinate systems of the children objects (constituents of the 
parent object) are defined relative to those of the parent object. When a 
motion is applied to the parent, all of its children are affected by it as well. In 
Chapter 2, we examined how to specify two static objects using the 4x4 
transformation matrix. The same principle applies here. Among two coord- 
inate systems, one parent and the other child, a 4 x 4 transformation matrix 




Figure 11.6. Elbowing with hierarchy of limbs. Note the unnatural look at the elbow 
when bent. 
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(a) 




(b) 



Figure 11.7. Two types of joints: (a) prismatic (linear) and (b) revolute (rotary). 



can be defined to specify the relative location and orientation between them. 
The only catch is that they are rotating (or moving) each other around with 
respect to a "joint." The 4x4 transformation matrix will thus contain a 
variable that varies according to the angular (or linear) movement with 
respect to the joint. Note that there are two types of joints, the usual revolute 
and linear. Figure 11.7 illustrates the differences between revolute and linear 
(also called the prismatic) joints. 

Forward Kinematics 

The kinematics (study of motion without regard to forces) of a character 
motion can be described by the chain of these transformation matrices. In 
forward kinematics, we are interested in using the joint values and comput- 
ing the resulting poses of the limbs in the 3D coordinate system (for render- 
ing; see Figure 11.8). 

The transformation matrices among the coordinate systems in the skel- 
eton hierarchy can be set up mechanically using the following standard 
procedure called the Denavit-Hartenberg (D-H) notation [Cra86]. Accord- 
ing to D-H notation, we assign the x, y, z-coordinates in a particular way to 
construct the skeleton. The z-axis is usually placed along the joint axis, and 
the x-axis is usually placed along the limbs (or links). Figure 1 1.9 shows the 
D-H parameters given a joint and two links around it. There are four D-H 
parameters: the joint angle 9, link length a, offset d, and the twist a. For 
a revolute joint, 8 will be a variable and for a linear joint, the offset d will 
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(x3, y3, z3) ? 



Figure 1 1.8. A pose of an arm with sets of joint angles. 




Side View Front View 

Figure 11.9. Denavit-Hartenberg parameters. 



be the variable instead. The other three parameters are constants according 
to the geometry of the skeleton. By setting up the coordinate system this 
way, the 4x4 transformation matrix can be found easily using the following 
equation and the D-H parameters as shown in Table 11.1. 
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Table 11.1. Denavit-Hartenberg parameters that relate the ith frame and 1th 
frame for setting up-T,_i [Par02] 



Name 


Symbol 


Description 


Link offset 


di 


Distance from x,_i to x, along z, 


Joint angle 


©, 


Angle between x,-_i and x, around r, 


Link length 




Distance from z,_i to z,- along 


Link twist 




Angle between z,_i to z, around 



0 ai-x 
— sin a,_ i — sin a,_ i J,- 
cosa,_i cos a.i-idi 
0 1 

For each joint along the root-to-leaf path in the skeleton hierarchy, a table 
can be constructed highlighting the values of D-H parameters (as shown in 
Table 11.2). We can establish a transformation matrix between two coord- 
inate systems (assigned in the joint) easily from Table 11.2, the D-H para- 
meter table, as follows (assume a default initial pose). 

Any coordinate along the end of the robot can be converted into the 
coordinate with respect to the root of the mechanism whose location or 
orientation is usually known with respect to the World coordinate. Thus, 
using the chain of these transformation matrices and the joint variables, 
anything along the chain of the mechanism can be expressed in the World 
coordinates and be rendered on the screen. Thus, because the local coord- 
inate of the endpoint of the articulated chain in Figure 11.10 is simply 
2 (L3, 0, 0), 

Endpoint of articulated chain in global coordinate (Frame 0) 

= °t 1 * 1 t 2 * 2 (L3, o, 0) 

When there are multiple degrees of freedom in one joint, we can establish 
multiple coordinate systems for each degree of freedom at the location of the 
joint (there will be no link length between these coordinate systems). 

The forward kinematics formulation and the 4x4 transformation matri- 
ces can easily be used with joint velocities or joint acceleration to compute 
displacements of limbs of the character in local or global coordinates. The 
same kinematic equations of motion seen in Chapter 10 apply here. 



cos ®i — sin @, 

sin cos a,-_ i cos @, cos a,_ 1 1 

sin ® , sin a,-_ i cos @, sin a,_ i 

0 0 



Table 11.2. D-H Parameter Table for the Articulated Chain Shown in Figure 11.10 



Joint number 


Oi-l 


©, 


di 


<*;-! 


1 


0 


©, 


0 


0 


2 


0 


© 2 


LI 


-90 


3 


L2 


03 


0 


0 
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Figure 11.10. An example of an articulated chain and coordinate system assignment. 



X = 7X0) 
dX' = r'(9) 

^New = X 0 \d + r'(0)*8? 

where 0 represents the joint angles and the X represents the Cartesian 
coordinates of a point on the articulated chain. 

Although the forward kinematic motion formulation can be used to 
control how a character moves by specifying the appropriate joint angle 
profiles in time, this method is not very intuitive (see Figure 11.11). For 
instance, to specify a walking motion of a human character with a thigh, 
knee, and an ankle, three joint angles must be specified with respect to time. 
It is not obvious what kinds of series of values will produce a natural 
humanlike walking motion. One method of producing natural motion is to 
use key framing. The user manually specifies a few intermediate poses of the 
leg segments, and the joint angles at these intermediate poses are extracted. 
Then the rest of the required series of joint angles is obtained by interpolat- 
ing between these key frame joint angles. This method is more intuitive 
(although manual) because the user works with the leg segments visually in 
the 3D space, not in the numerical joint space. 
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Figure 11.11. A human leg animation sequence. Figuring out the continuous joint 
angle profile in time is not intuitive (upper). The modeler can specify key postures 
and figure out the intermediate joint angles by interpolation. Still, this can be 
imprecise depending on the number of key postures (lower). 

Inverse Kinematics 

It would be nice, with the hierarchical chain of limbs, if the user could simply 
specify the motion profile of the endpoint of the articulated chain (such as 
the hand) in the 3D Cartesian space, and the appropriate joint angles be 
computed automatically. This is called the inverse kinematics problem: 
figuring out the joint angles from the end effector position (any point 
along the chain of limbs; see Figure 11.12). Thus forward and inverse 
kinematics equations can be written as 

X = f(Q): Forward kinematics 
Q = f~ (X): Inverse kinematics 

where Q represents the joint angles and X represents the 3D Cartesian 
coordinate values of the end effector. 

The problem is that a closed form solution of f l is very difficult to obtain 
when the number of joints exceeds three or more. f( Q) already consists of 
nonlinear terms including sine and cosine functions. Moreover, there may be 
many sets of joint angles that can produce a given location or orientation of 
the end effector. Thus, one way to overcome this problem is to solve the 
problem in the differential space, which linearizes the problem space. From 
the forward kinematics equations, we can obtain, by differentiating the 
equation with respect to time, 

J(Q), which relates the joint angle velocity to the end effector velocity, is 
called the Jacobian. The inverse of J(Q) is computed to obtain the differ- 
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ential form of the inverse kinematics equation. Because this technique lin- 
earizes a nonlinear problem, the equation is only an approximated solution, 
valid only at a given joint configuration and a given instant of time (J( Q) 
changes in time). 

a = j-\q)x' 

Thus, we can compute the joint angles using the above formulation in this 
way. Starting from a known initial configuration ginit and given a goal 
position of the end effector Aooai, 

^Current = Cinit /* Initialization 7 

LOOP until (Aooai - ^current < small threshold) /* end loop when arrived 

at goal */ 

1. ^Current = /(^Current) /* forward kinematics */ 

2. Compute 8x = X Goai - ^Current 

3. Compute /(^Current) and /"'(Scurrent) 

4. Compute Q = /-'(Gcurrem)^' 

5. ^Current - Q(t + §0 = ^Current + g'*8? 

That is, at every simulation tick, the new updated end effector position is 
computed from the old joint angles using the forward kinematics equation. 
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Then a new Sx is computed from which / and / _1 are newly computed with 
the old £>'. Using the new / and J~ l , a new Q' is computed, and the joint 
angle is updated for that small time duration 8t. The matrix J~ l is rank 
deficient (many solutions exist) and we can solve for the pseudo inverse /+. 

/+ = J T (JJ T Y X 

The use of pseudo inverse produces a particular selection of Q' that happens 
to have a characteristic of minimizing the joint angle rates. Other techniques 
exist that looks for a different Q. An example is presented for a simple 
kinematic shown in Figure 11.13. Suppose that we have the following 
forward kinematics equations 

X = i?sin9i 
Y = -tfcosOi 
Z = d-T 

where R and 0\ are the joint angles and d and T are constants (link length 
and offset). Instead of trying to figure out the closed form solution for R and 
01, we differentiate the equations with respect to time. 

dX = dRcos<di -i?sin8i</8i 
dY= -di?sin9i - Rcos§\d§ x 
dZ = dZ 

In matrix form, the above equations become 

(dX s 
dY 
\dZ, 



J = 



There exist more elaborate inverse kinematics methods and we do not cover 
them here. Incorporating dynamics for an articulated chain is an even more 
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Figure 11.13. A simple kinematic chain with one rotary joint and one prismatic. 
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Figure 11.14. A motion capture system. 



difficult problem. A direct motion capture (direct recording of joint angles 
using sensors worn by an actor; see Figure 11.14) is often used to overcome 
this problem. 

Summary 

Animating human characters is important because any realistic virtual en- 
vironment will need living and moving entities in it. Accomplishing charac- 
ter animation starts with modeling of its form as a mesh or hierarchy of limb 
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meshes and defining a skeleton structure for control. Appropriate parts of 
the character mesh are bound to the skeleton and can be moved by moving 
the underlying skeleton. The skeleton is simply a hierarchy of coordinate 
systems, and the hierarchy can be specified methodically by using the 
Denavit and Hartenberg parameters and conventions. The forward kinemat- 
ics can be solved by formulating the relative orientation and location of 
neighboring coordinate systems (or bones) in terms of the changing joint 
angles. Using the forward kinematics formulation, a motion can be created 
by changing the joint angle values. The inverse kinematics lets users specify a 
convenient point (such as the end effector of the articulated chain) along the 
articulated chain and figures out the joint angles (the reverse problem of 
forward kinematics). It is usually more intuitive to specify motion with end 
effector positions and compute the joint angles instead. Generating realistic 
dynamic motions with forces and torques is very difficult and often a direct 
motion capture is used and adapted in real-time. 

Pondering Points 

• Model a ball-socket shoulder joint using the Denavit and Hartenberg 
convention. 

• How can we use pre-captured (or modeled) motion data in a real-time 
interactive virtual environment? 

• How can motion capture data captured for a certain actor be applied to a 
differently sized character? 

• How can we connect two different motion profiles in a smooth manner? 

• How would it be possible to model and simulate the bulging landscape of 
the skin due to muscle movements? 

• Facial movement mostly occurs by muscle movements. However, the 
animation technique introduced in this chapter uses a bone-(skeleton)- 
based animation. What kind of underlying control structure might we 
need for facial animation? 

• A face has a distinct landscape (e.g., high nose, cheekbones, round eyes). 
Modeling a face as a flat surface and pasting a picture may fall short of 
producing realistic effects. We can put more effort into modeling a face 
with more geometric details. How can we paste a picture more effectively 
on a facial model with more geometric details? 

• What do you think is the minimum level of detail for representing humans 
or animals in virtual environments? Discuss it in terms of not only visual 
features, but others (aural, behavioral, haptic, etc.). 

• For the Ship Simulator example, animate an arm (touching the Steering- 
Wheel and the Engine Lever) using a simple inverse kinematics model (e.g. 
three degrees of freedom). 
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Virtual Space, the Final Frontier 



Other Important Subareas of Virtual Reality (VR) 

In this book, I have covered some essential topics in building VR systems. It 
started with laying out the requirements and making rough specifications 
using storyboards and other constructs. The first phase of development 
focused on verifying the basic functionality and performance of the intended 
system. This led to the next phases of development where more VR-oriented 
elements would be added such as 3D multimodal interaction, collision 
handling, physical simulations, and character animations. By no means, 
are the topics or components of VR covered in this book exhaustive. For 
instance, one important aspect of VR in terms of promoting the sense of 
user-felt presence is the interaction with other "living" entities (rather than 
just manipulating static objects). This is possible mainly by: (1) building a 
networked and shared virtual environment and having multiple users inter- 
act with one another, and/or (2) employing autonomous entities controlled 
by artificial intelligence for believable interaction [Sin99;Rab02]. Another 
emerging area in VR is image-based rendering [Mcm95]. The goal of image- 
based rendering is to model virtual environments using a set of images. The 
image from a viewpoint where an image was not sampled is generated, for 
instance, by mixing or warping already sampled images from nearby view- 
points. However, image-based rendering requires high computational load 
and large storage capacity and is limited in providing interactivity. 

A closely related area is computer vision. Computer vision is important 
for VR in at least the areas of (1) vision-based tracking [Reh94], and (2) 
model reconstruction (also known as image-based modeling). Vision-based 
tracking is attractive because the user is free of tether, and cameras are 
becoming more and more ubiquitous. The heavy computational load needed 
for vision-based algorithms is becoming less of an issue with the ever- 
increasing computational power of today's PCs. Using textures or environ- 
ment maps is a simple form of image-based modeling. A more advanced 
form is to extract 3D models from images for quick constructions of photo- 
realistic 3D models [Deb96]. A more futuristic avenue for VR is happening 
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with the advances in neurotechnology and brain science, to be more specific, 
in efforts to directly interface with the human nervous system. For instance, 
neurotechnology has already advanced to a level to restore (limited) vision 
to the blind [Wir02]. One obvious application of such technology is in 
creating display systems for VR by tapping directly into our sensory nervous 
system (input to the brain). However, interpreting neural signals (for con- 
trol) would be a far more challenging problem (output from the brain). 

Finally, in order for VR to take its place as one of the mainstream media 
for digital content, it is vital that research and industry produce more usable 
and effective authoring tools. It is easy to recognize the successes of the 
WYSIWYG (What You See Is What You Get) types of computer-based 
tools for creating documents, presentations, movies, 3D models, and even 
short animations. Likewise, the proliferation of VR technology and VR- 
based contents will also depend on the WYFIWYG (What You Feel Is What 
You Get) [LeeG04] type of authoring tools that can help developers com- 
pose, reuse, and integrate different components at the level of virtual objects 
and entities, and at the level of systemwide functionality (e.g., sensing, 
display, collision, performance management, etc.). 

Is VR Really Any Good? 

In this book, I have claimed that the two main pillars of VR (as distinguished 
from mere 3D interactive graphics) were 3D multimodality and the sense of 
user-felt presence. However, partly due to the different definitions of pres- 
ence put forth by a number of different people, there has been much debate 
over whether other conventional media (such as books, movies, 3D games) 
can induce the sense of presence (or immersion) as well. Such an issue is 
important for identifying and establishing the uniqueness and value of 
virtual reality systems. For instance, if indeed it is possible to induce psy- 
chological immersion by manipulation of story, plots, and abstract inter- 
action, then the digital content such as the interactive story or games can be 
conveyed sufficiently using conventional desktop interfaces rather than 
employing expensive and often difficult to use and engineer VR setups. 
This warrants a bit more explanation about the concept of presence as 
follows. 

VR for Spatial Presence 

One of the important and defining goals of virtual reality systems is to create 
presence and to fool the user into believing that one is, or is doing something 
"in" the synthetic environment. Many researchers have defined and 
explained presence in different ways [ISP04]. Historically, in the context of 
virtual reality, the concept of presence has been much associated with spatial 
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perception as its informal definition of "feeling of being there" suggests 
[Hee92;ISP04]. Similarly, many studies have identified system elements 
that contribute to enhanced user-felt presence, and many of them are spatial 
or perceptual cues such as providing a wide field of view display, head- 
tracking, stereoscopy, 3D sound, proprioception, maps/landmarks, and spa- 
tial interaction [ISP04]. 

Other studies in presence have challenged this view and attempted to 
widen the concept to include psychological immersion, thus linking higher- 
level and "nontechnological" elements (processed in a top-down fashion in 
our brain) to presence such as story and plots, flow, attention and focus, 
identification/empathy with the characters, social interaction, emotion, pre- 
knowledge, and so on [ISP04;Riv03;Sas03]. One can argue that there is a 
(evolving) dichotomy within the concept of presence as illustrated in Table 
E.l (the table should be taken as an illustration; that is, in reality, the 
separation is not as clear cut). 

Thus, scholars now generally agree that there are different types of pres- 
ence, such as spatial presence, social presence, and psychological (or concep- 
tual) presence [ISP04]. Among them, the spatial presence (also known as 
physical presence) refers to the sense of physical and concrete space, often 
dubbed as the sense of being there (e.g., virtual environment). Spatial pres- 
ence bears particular importance to virtual reality "technologists" interested 
in providing location-based experiences, because it is seemingly (although 
not proven) more dependent on the "form" (or system/hardware/technical) 
factors of the VR system. Spatial presence is formed as a product of a 
bottom-up perceptual process that gathers spatial cues to actively place 
and register the user in the surrounding environment. Thus, in general, 



Table E. 1 . The dichotomy within the concept of presence [LeeS04] 



Nonspatial presence 



Spatial presence 



Nature 



Individual 

Difference 
Space [WatOl] 
Formation 



Factors 



Conceptual/cognitive/psychological/ 
social (e.g., feeling of being in an 
abstract space or part of a story, 
"I felt like I was James Bond") 

More subjective 

Conceptual/abstract 

Formed as byproduct of voluntary 

and conscious top-down 

processing (high level) 
Involves rational, abstract, 

and logical reasoning 
Nontechnological (content): story, 

plot, attention, focus, abstract 

interaction, role playing, 

emotion, social interaction 

(deliberate) nonrealism, etc. 



Perceptual/physiological (e.g., feeling 
of being in concrete space, "I felt 
like I was on the moon") 

More objective 

Concrete/physical 

Formed as byproduct of involuntary 
bottom-up processing of raw 
sensory cues (low level) 

Involves reflexive behavior 
responsive to stimuli 

Technological (form): display, bodily 
interaction, FOV, motion, shadow, 
graphic realism, texture resolution, 
simulation/motion realism, 
exposure time, etc. 
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employing expensive VR devices can be superfluous if the purpose of the 
system is nonspatial. On the other hand, VR as a technology will have a 
unique value in providing strong spatial context for those applications that 
require it, such as many training and educational systems. For instance, a 
virtual training system for finding the fire exit will require provision of strong 
spatial cues so that training effects can transfer to the real situation. 
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